Session: Newton-ish and Higher-Order Methods
Chair: Nick Tsipinakis
Cluster:
Talk 1: Multilevel Regularized Newton Methods with Fast Convergence Rates
Speaker: Nick Tsipinakis
Abstract: We introduce new multilevel methods for solving large-scale unconstrained optimization problems. Specifically, the philosophy of multilevel methods is applied to Newton-type methods that regularize the Newton sub-problem using second-order information from a coarse (low dimensional) sub-problem. The new regularized multilevel methods provably converge from any initialization point and enjoy faster convergence rates than Gradient Descent. In particular, for arbitrary functions with Lipschitz continuous Hessians, we show that their convergence rate interpolates between the rate of Gradient Descent and that of the cubic Newton method. If, additionally, the objective function is assumed to be convex, then the proposed method converges with the fast $\mathcal{O}(k^{-2})$ rate. Hence, since the updates are generated using a coarse model in low dimensions, the theoretical results of this paper significantly speed-up the convergence of Newton-type or preconditioned gradient methods in practical applications. Preliminary numerical results suggest that the proposed multilevel algorithms are significantly faster than current state-of-the-art methods. [1] K. Mishchenko, Regularized Newton method with global convergence, SIAM Journal on Optimization, 33 (2023), pp. 1440–1462. [2] N. Doikov and Y. Nesterov, Gradient regularization of Newton method with Bregman dis-tances, Mathematical programming, 204 (2024), pp. 1–25. [3] N. Tsipinakis and P. Parpas, A multilevel method for self-concordant minimization, Journal of Optimization Theory and Applications, (2024), pp. 1–51.
Talk 2: First-ish Order Methods: Hessian-aware Scalings of Gradient Descent
Speaker: Oscar Smee
Abstract: Gradient descent is the primary workhorse for optimizing large-scale problems in machine learning. However, its performance is highly sensitive to the choice of the learning rate. A key limitation of gradient descent is its lack of natural scaling, which often necessitates expensive line searches or heuristic tuning to determine an appropriate step size. In this paper, we address this limitation by incorporating Hessian information to scale the gradient direction. By accounting for the curvature of the function along the gradient, our adaptive, Hessian-aware scaling method ensures a local unit step size guarantee, even in nonconvex settings. Near a local minimum that satisfies the second-order sufficient conditions, our approach achieves linear convergence with a unit step size. We show that our method converges globally under a significantly weaker version of the standard Lipschitz gradient smoothness assumption. Even when Hessian information is inexact, the local unit step size guarantee and global convergence properties remain valid under mild conditions. Finally, we validate our theoretical results empirically on a range of convex and nonconvex machine learning tasks, showcasing the effectiveness of the approach. Preprint: https://arxiv.org/abs/2502.03701