Loading…
Session: Randomized algorithms with applications to machine learning
Chair: Laurent Condat
Cluster: Optimization For Data Science

Talk 1: Subspace Optimization for Large Language Models with Convergence Guarantees
Speaker: Kun Yuan
Abstract: Subspace optimization algorithms, with GaLore (Zhao et al., 2024) as a representative method, have gained popularity for pre-training or fine-tuning large language models (LLMs) due to their memory efficiency. However, their convergence guarantees remain unclear, particularly in stochastic settings. In this paper, we unexpectedly discover that GaLore does not always converge to the optimal solution and substantiate this finding with an explicit counterexample. We then investigate the conditions under which GaLore can achieve convergence, demonstrating that it does so either in deterministic scenarios or when using a sufficiently large mini-batch size. More significantly, we introduce GoLore (Gradient random Low-rank projection), a novel variant of GaLore that provably converges in stochastic settings, even with standard batch sizes. Our convergence analysis can be readily extended to other sparse subspace optimization algorithms. Finally, we conduct numerical experiments to validate our theoretical results and empirically explore the proposed mechanisms.

Talk 2: Convergence of Gradient Descent with Linearly Correlated Noise and Applications to Differentially Private Learning
Speaker: Anastasia Koloskova
Abstract: We study gradient descent under linearly correlated noise. Our work is motivated by recent practical methods for optimization with differential privacy (DP), such as DP-FTRL, which achieve strong performance in settings where privacy amplification techniques are infeasible (such as in federated learning). These methods inject privacy noise through a matrix factorization mechanism, making the noise linearly correlated over iterations. We propose a simplified setting that distills key facets of these methods and isolates the impact of linearly correlated noise. We analyze the behavior of gradient descent in this setting, for both convex and non-convex functions. Our analysis is demonstrably tighter than prior work and recovers multiple important special cases exactly (including anticorrelated perturbed gradient descent). We use our results to develop new, effective matrix factorizations for differentially private optimization, and highlight the benefits of these factorizations theoretically and empirically.

Talk 3: Optimal Stochastic Algorithms for Distributionally Robust Learning
Speaker: Zaid Harchaoui
Abstract: We consider the penalized distributionally robust optimization (DRO) problem with a closed, convex uncertainty set, a setting that encompasses the f-DRO, Wasserstein-DRO, and spectral/L-risk formulations used in practice. We present Drago, a stochastic primal-dual algorithm that achieves a state-of-the-art linear convergence rate on strongly convex-strongly concave DRO problems. The method combines both randomized and cyclic components with mini-batching, which effectively handles the unique asymmetric nature of the primal and dual problems in DRO. We support our theoretical results with numerical benchmarks in classification and regression.

Speakers
avatar for Laurent Condat

Laurent Condat

Senior Research Scientist, King Abdullah University of Science and Technology (KAUST)
Laurent Condat received a PhD in applied mathematics in 2006 from Grenoble Institute of Technology, Grenoble, France. After a postdoc in the Helmholtz Zentrum Muenchen, Munich, Germany, he was hired in 2008 as a permanent researcher by the French National Center for Scientific Research... Read More →
KY

Kun Yuan

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
AK

Anastasia Koloskova

Name: Dr. Slothington "Slow Convergence" McNapfaceTitle: Distinguished Professor of Continuous Optimization & Energy MinimizationAffiliation: The Lush Canopy Institute of Sluggish AlgorithmsBio:Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
ZH

Zaid Harchaoui

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
Wednesday July 23, 2025 4:15pm - 5:30pm PDT
Joseph Medicine Crow Center for International and Public Affairs (DMC) 157 3518 Trousdale Pkwy, 157, Los Angeles, CA 90089

Attendees (1)


Log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link