Loading…
Thursday July 24, 2025 1:15pm - 2:30pm PDT
Session: Advances in first-order methods for large-scale optimization
Chair: Jiawei Zhang
Cluster: Nonlinear Optimization

Talk 1: Acceleration by Random Stepsizes
Speaker: Jason Altschuler
Abstract: We show that for separable convex optimization, random stepsizes fully accelerate Gradient Descent (GD). Specifically, using iid stepsizes from the Arcsine distribution improves the iteration complexity from O(k) to O(\sqrt{k}), where k is the condition number. No momentum or other modifications to GD are required. This result is incomparable to the (deterministic) Silver Stepsize Schedule, which does not require separability but only achieves partial acceleration O(k^{\log_{1+\sqrt{2}} 2}) \approx O(k^{0.78}). Our starting point is a conceptual connection to potential theory: the variational characterization for the distribution of stepsizes with fastest convergence rate mirrors the variational characterization for the distribution of charged particles with minimal logarithmic potential energy. The Arcsine distribution solves both variational characterizations due to a remarkable "equalizing property'' which in the physical context amounts to a constant potential over space, and in the optimization context amounts to an identical convergence rate for all quadratic functions. A key technical insight is that martingale arguments extend this phenomenon to all separable convex functions.

Talk 2: Last Iterate Convergence of Incremental Methods and Applications in Continual Learning
Speaker: Xufeng Cai
Abstract: Incremental gradient and incremental proximal methods are a fundamental class of optimization algorithms used for solving finite sum problems, broadly studied in the literature. Yet, without strong convexity, their convergence guarantees have primarily been established for the ergodic (average) iterate. Motivated by applications in continual learning, we obtain the first convergence guarantees for the last iterate of both incremental gradient and incremental proximal methods, in general convex smooth (for both) and convex Lipschitz (for the proximal variants) settings. Our oracle complexity bounds for the last iterate nearly match (i.e., match up to a square-root-log or a log factor) the best known oracle complexity bounds for the average iterate, for both classes of methods. We further obtain generalizations of our results to weighted averaging of the iterates with increasing weights and for randomly permuted ordering of updates. We study incremental proximal methods as a model of continual learning with generalization and argue that large amount of regularization is crucial to preventing catastrophic forgetting. Our results generalize last iterate guarantees for incremental methods compared to state of the art, as such results were previously known only for overparameterized linear models, which correspond to convex quadratic problems with infinitely many solutions.

Talk 3: Unveiling Spurious Stationarity and Hardness Results for Bregman Proximal-Type Algorithms
Speaker: Jiajin Li
Abstract: Bregman proximal-type algorithms, such as mirror descent, are popular in optimization and data science for effectively exploiting problem structures and optimizing them under tailored geometries. However, most of existing convergence results rely on the gradient Lipschitz continuity of the kernel, which unfortunately excludes most commonly used cases, such as the Shannon entropy. In this paper, we reveal a fundamental limitation of these methods: Spurious stationary points inevitably arise when the kernel is not gradient Lipschitz. The existence of these spurious stationary points leads to an algorithm-dependent hardness result: Bregman proximal-type algorithms cannot escape from a spurious stationary point within any finite number of iterations when initialized from that point, even in convex settings. This limitation is discovered through the lack of a well-defined stationarity measure based on Bregman divergence for non-gradient Lipschitz kernels. Although some extensions attempt to address this issue, we demonstrate that they still fail to reliably distinguish between stationary and non-stationary points for such kernels. Our findings underscore the need for new theoretical tools and algorithms in Bregman geometry, paving the way for further research.

Speakers
JZ

Jiawei Zhang

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
JA

Jason Altschuler

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
XC

Xufeng Cai

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
JL

Jiajin Li

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
Thursday July 24, 2025 1:15pm - 2:30pm PDT
Taper Hall (THH) 102 3501 Trousdale Pkwy, 102, Los Angeles, CA 90089

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link