Loading…
Tuesday July 22, 2025 4:15pm - 5:30pm PDT
Session: Adaptive Stochastic Gradient Methods
Chair: Lin Xiao
Cluster: Nonlinear Optimization

Talk 1: The Road Less Scheduled
Speaker: Aaron Defazio
Abstract: Schedule-Free learning algorithms allow for the training of models in an any-time fashion, without compromising on speed, memory or final test metrics. I will dive into the details of how Schedule-Free learning works and show how it provides further quality-of-life improvements to practitioners, and provide details of our winning entry to the AlgoPerf algorithmic efficiency optimization challenge that used Schedule-Free AdamW.

Talk 2: Analyzing AdaGrad Under Anisotropic Smoothness Assumptions
Speaker: Yuxing Liu
Abstract: Adaptive gradient methods have demonstrated remarkable success for training large-scale deep neural networks. However, the theoretical understanding of these methods, particularly in the large batch size regime (which is commonly used in practice), remains limited. In this talk, we aim to address this gap by introducing a generalized anisotropic smoothness assumption that better reflects the behavior of modern neural network training. Our theoretical analysis reveals that AdaGrad achieves provably faster convergence compared to standard gradient methods, even when large batch sizes are employed. These results provide valuable theoretical insights into the practical efficacy of adaptive gradient methods.

Talk 3: A Novel Approach to Loss Landscape Characterization without Over-Parametrization
Speaker: Antonio Orvieto
Abstract: Modern machine learning heavily depends on the effectiveness of optimization techniques. While deep learning models have achieved remarkable empirical results in training, their theoretical underpinnings remain somewhat elusive. Ensuring the convergence of optimization methods requires imposing specific structures on the objective function, which often do not hold in practice. One prominent example is the widely recognized Polyak-Lojasiewicz (PL) inequality, which has garnered considerable attention in recent years. However, validating such assumptions for deep neural networks entails substantial and often impractical levels of over-parametrization. In order to address this limitation, we propose a novel class of functions that can characterize the loss landscape of modern deep models without requiring extensive over-parametrization and can also include saddle points. Crucially, we prove that gradient-based optimizers possess theoretical guarantees of convergence under this assumption. Finally, we validate the soundness of our assumption through both theoretical analysis and empirical experimentation across a diverse range of deep learning models.

Speakers
LX

Lin Xiao

Lin Xiao is a Research Scientist at Facebook AI Research (FAIR) in Seattle, Washington. He received BE from Beijing University of Aeronautics and Astronautics (Beihang University) and PhD from Stanford University, and was a postdoctoral fellow in the Center for the Mathematics of... Read More →
AD

Aaron Defazio

Research Scientist, Meta Platforms, Inc.
Aaron Defazio is a Research Scientist at Meta on the Fundamental AI Research Team, specializing in the field of optimization algorithms for machine learning. Aaron holds a PhD in Computer Science from ANU (Australian National University) and has a rich background in research, having... Read More →
YL

Yuxing Liu

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
AO

Antonio Orvieto

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
Tuesday July 22, 2025 4:15pm - 5:30pm PDT
Taper Hall (THH) 101 3501 Trousdale Pkwy, 101, Los Angeles, CA 90089

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link