Session: Analysis and Design of First Order Methods for Convex Optimization
Chair: Bartolomeo Stellato
Cluster: Optimization For Data Science
Talk 1: Toward a grand unified theory of accelerations in optimization and machine learning
Speaker: Ernest Ryu
Abstract: Momentum-based acceleration of first-order optimization methods, first introduced by Nesterov, has been foundational to the theory and practice of large-scale optimization and machine learning. However, finding a fundamental understanding of such acceleration remains a long-standing open problem. In the past few years, several new acceleration mechanisms, distinct from Nesterov’s, have been discovered, and the similarities and dissimilarities among these new acceleration phenomena hint at a promising avenue of attack for the open problem. In this talk, we discuss the envisioned goal of developing a mathematical theory unifying the collection of acceleration mechanisms and the challenges that are to be overcome.
Talk 2: Data-driven performance estimation of first-order methods
Speaker: Jisun Park
Abstract: We introduce a data-driven approach to analyze the probabilistic performance of first-order optimization algorithms. Combining Wasserstein Distributionally Robust Optimization to the performance estimation framework, we derive probabilistic performance guarantees to a wide range first-order methods. We show that our method is able to achieve significant reductions in conservatism compared to classical worst-case performance analysis tools.
Talk 3: Exact Verification of First-Order Methods for Quadratic Optimization via Mixed-Integer Programming
Speaker: Vinit Ranjan
Abstract: We present a mixed-integer programming based framework to exactly verify the convergence of first-order methods for parametric convex quadratic and linear optimization. We formulate the verification problem as a mathematical optimization problem where we maximize the infinity norm of the fixed-point residual at the last iteration subject to constraints on the parameters and initial iterates. Our approach uses affine and piecewise affine steps to exactly represent a wide range of gradient, projection, and proximal steps. We scale the mixed-integer formulation using advanced bound tightening and strong formulations for the piecewise affine steps. Numerical examples show orders of magnitude lower worst-case residuals that more closely match the practical convergence.