Name: Parallel Sessions 1W: Adaptive and Accelerated First-Order Methods
Start: 2025-07-21T10:30:00-0700
End: 2025-07-21T11:45:00-0700

Monday July 21, 2025 10:30am - 11:45am PDT

Taper Hall (THH) 112

Session: Adaptive and Accelerated First-Order Methods
Chair: Wenzhi Gao
Cluster:

Talk 1: Gradient Descent as a Collaborative Game
Speaker: Wenzhi Gao
Abstract: We introduce a framework to accelerate the convergence of gradient-based methods with online learning. The framework learns to update the stepsize in gradient descent with online learning and provably accelerates gradient-based methods. A key insight is to view gradient descent as a collaborative game between the stepsize scheduler and the optimization landscape -- both players working together for faster convergence. We also discuss implications of the framework, including global and local convergence properties and several extensions. Numerical experiments on deterministic convex and nonconvex problems demonstrate the promising performance of our method. Reference: https://arxiv.org/pdf/2411.01803

Talk 2: An Adaptive and Parameter-Free Nesterov's Accelerated Gradient Method
Speaker: Jaewook J. Suh
Abstract: In this talk, we introduce AdaNAG, an adaptive accelerated gradient method based on Nesterov's accelerated gradient (NAG). The algorithm is line-search-free, parameter-free, and achieves the accelerated convergence rates $f(x_k) - f_\star = O(1/k^2)$ and $\min_{i\in\{1, ... ,k\}} \|\nabla f(x_i)\|^2 = O(1/k^3)$ for an $L$-smooth convex function $f$. We provide a Lyapunov analysis for the convergence proof of AdaNAG, which additionally enables us to propose a novel adaptive gradient descent (GD) method, AdaGD. AdaGD achieves the non-ergodic convergence rate $f(x_k) - f_\star = O(1/k)$, like the original GD. Motivated by the relationship between the parameter choice and the convergence guarantee of AdaGD, we obtain a generalized AdaNAG that provides a practically useful variant of AdaNAG. We provide numerical results showing that our method outperforms other recently proposed adaptive methods in certain scenarios.

Talk 3: Stochastic gradient methodswithBlock Coordinate Optimistic Stepsizes
Speaker: Tao Jiang
Abstract: Ill-conditioning is a major challenge for optimization with first-order methods. This is especially the case for stochastic optimization, where preconditioners in the classical sense are hard to construct due to the nature of stochastic gradients. We propose a block-coordinate stepsize rule that can effectively combat ill-conditioning as well as inhomogeneous noise in the stochastic setting. Our method is motivated by minimizing the expected distance to an optimal point during each iteration. Specifically, we use the optimistic stepsizes as if the expected search directions (e.g., stochastic gradients with or without momentum) along each coordinate always point to the optimal point. These stepsizes rely on online estimates of the second-moments of the coordinate-wise search directions. The popular Adam algorithm can be interpreted as a heuristic for such an estimation. Compared with Adam, our method requires fewer hyperparameters, obtains similar or better performance, and is numerically more stable.

Speakers

Wenzhi Gao

Ph.D. student, Stanford University

Name: Wenzhi GaoSecond year Ph.D. student at Stanford ICME, working on large-scale numerical optimization, first-order methods, and online decision-making problems.

Tao Jiang

Tao Jiang is a postdoctoral researcher at FAIR. She received her PhD in Operations Research at Cornell University advised by Damek Davis, MMath in Combinatorics and Optimization at University of Waterloo advised Stephen Vavasis and BEng in Engineering Systems and Design at Singapore... Read More →