Session: Methods for Large-Scale Nonlinear Optimization II
Chair: Mike O'Neill
Cluster: Nonlinear Optimization
Talk 1: Stochastic-Gradient-based Interior-Point Methods
Speaker: Frank E. Curtis
Abstract: I will discuss some of our recent work on stochastic-gradient-based interior-point algorithms for solving constrained optimization problems, such as those arising in informed machine learning. The algorithms are single-loop in the sense that they do not require an inner criterion for updating the barrier parameter; rather, the barrier parameter is decreased according to a prescribed sequence. Convergence guarantees are attained in both deterministic and stochastic settings. The algorithms exhibit good practical performance in comparison to projected-gradient-based methods.
Talk 2: Fast unconstrained optimization via Hessian Averaging and Adaptive Gradient Sampling Methods
Speaker: Raghu Bollapragada
Abstract: In this talk, we discuss minimizing finite-sum and expectation objective functions using Hessian-averaging-based subsampled Newton methods. These methods accommodate gradient inexactness and maintain fixed per-iteration Hessian approximation costs. Recent work (Na et al. 2023) showed that Hessian averaging achieves fast $\mathcal{O}\left(\sqrt{\tfrac{\log k}{k}}\right)$ local superlinear convergence for strongly convex functions, but requires gradient exactness and strong convexity, limiting practical use. To address this, we propose Hessian-averaged methods with adaptive-sampling strategies allowing gradient inexactness. For finite-sum problems, we use deterministic sampling, yielding global linear and sublinear convergence for strongly convex and nonconvex functions. We derive an improved local superlinear rate of $\mathcal{O}\left(\tfrac{1}{k}\right)$. For expectation problems, we use stochastic sampling and derive global linear/sublinear rates and a local superlinear rate of $\mathcal{O}\left(\tfrac{1}{\sqrt{k}}\right)$. Additionally, we introduce scalable methods like the diagonally-averaged Newton (Dan) method for large-scale problems. Numerical results show that Hessian averaging enhances convergence and achieves state-of-the-art performance on challenging tasks like CIFAR100 classification with ResNets.
Talk 3: A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity
Speaker: Sadok Jerad
Abstract: A fully stochastic second-order adaptive-regularization method for unconstrained non-convex optimization is presented which never computes the objective-function value, but yet achieves the optimal complexity bound for finding first-order critical points. The method is fully adaptive and the inexactness conditions required for convergence depend on the history of past steps. Numerical experiments on large binary classification problems illustrate the potential of the new method.