Name: Parallel Sessions 6B: Machine Learning Algorithms
Start: 2025-07-22T16:15:00-0700
End: 2025-07-22T17:30:00-0700

Tuesday July 22, 2025 4:15pm - 5:30pm PDT

Taper Hall (THH) 201

Session: Machine Learning Algorithms
Chair: Mark Schmidt
Cluster: Optimization Applications (Communication, Energy, Health, ML, ...)

Talk 1: Leveraging Variable Sparsity to Refine Pareto Stationarity in Multi-Objective Optimization
Speaker: Yaoliang Yu
Abstract: Gradient-based multi-objective optimization (MOO) is essential in modern machine learning, with applications in e.g., multi-task learning, federated learning, algorithmic fairness and reinforcement learning. In this work, we first reveal some limitations of Pareto stationarity, a widely accepted first-order condition for Pareto optimality, in the presence of sparse function-variable structures. Next, to account for such sparsity, we propose a novel solution concept termed Refined Pareto Stationarity (RPS), which we prove is always sandwiched between Pareto optimality and Pareto stationarity. We give an efficient partitioning algorithm to automatically mine the function-variable dependency and substantially trim non-optimal Pareto stationary solutions. Then, we show that gradient-based descent algorithms in MOO can be enhanced with our refined partitioning. In particular, we propose Multiple Gradient Descent Algorithm with Refined Partition (RP-MGDA) as an example method that converges to RPS, while still enjoying a similar per-step complexity and convergence rate. Lastly, we validate our approach through experiments on both synthetic examples and realistic application scenarios where distinct function-variable dependency structures appear. Our results highlight the importance of exploiting function-variable structure in gradient-based MOO and provide a seamless enhancement to existing approaches.

Talk 2: High-dimensional Optimization with Applications to Compute-Optimal Neural Scaling Laws
Speaker: Courtney Paquette
Abstract: Given the massive scale of modern ML models, we now only get a single shot to train them effectively. This restricts our ability to test multiple architectures and hyper-parameter configurations. Instead, we need to understand how these models scale, allowing us to experiment with smaller problems and then apply those insights to larger-scale models. In this talk, I will present a framework for analyzing scaling laws in stochastic learning algorithms using a power-law random features model, leveraging high-dimensional probability and random matrix theory. I will then use this scaling law to address the compute-optimal question: How should we choose model size and hyper-parameters to achieve the best possible performance in the most compute-efficient manner?

Talk 3: A Robustness Metric for Distribution Shifts
Speaker: John Duchi
Abstract: We revisit the stability of optimizers in statistical estimation and stochastic optimization problems, but instead of providing guarantees on the stability of the minimizers themselves, we investigate what shifts to the underlying data-generating process perturb solutions the most. To do so, we develop some new mathematical tools for stability analyses, with guarantees beyond typical differentiable problems. We also make connections with statistical hypothesis testing and discovery, showing how these new results provide certificates of validity---or potential invalidity---of statistical estimate.

Speakers

Mark Schmidt

Professor, UBC

Mark Schmidt is a professor in the Department of Computer Science at the University of British Columbia. His research focuses on developing faster algorithms for large-scale machine learning, and exploring applications of machine learning. He is a Canada Research Chair, Alfred P... Read More →

John Duchi

John Duchi is an associate professor of Statistics and Electrical Engineering and (by courtesy) Computer Science at Stanford University. His work spans statistical learning, optimization, information theory, and computation, with a few driving goals. (1) To discover statistical learning... Read More →

Yaoliang Yu

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →

Courtney Paquette

Tuesday July 22, 2025 4:15pm - 5:30pm PDT
Taper Hall (THH) 201 3501 Trousdale Pkwy, 201, Los Angeles, CA 90089

Parallel Session

ICCOPT2025USC

Mark Schmidt

John Duchi

Yaoliang Yu

Courtney Paquette

Attendees (6)

Get help with the event

ICCOPT2025USC

Mark Schmidt

John Duchi

Yaoliang Yu

Courtney Paquette

Attendees (6)

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event