Session: Recent Advances in Large-scale Optimization III
Chair: Salar Fattahi
Cluster: Nonlinear Optimization
Talk 1: Convexification of a class of optimization problems with step functions
Speaker: Andres Gomez
Abstract: We study nonlinear optimization problems with discontinuous step functions. This class of problems arises often in statistical problems, modeling correct/incorrect predictions with linear classifiers, fairness and nearly isotonic regression. We discuss special classes of problems where convex hulls can be computed explicitly, and discuss algorithmic implementations of the convexifications. Our computational results indicate a good statistical estimators can be obtained from the optimal solutions of the convex relaxations, and that exact methods can be substantially improved in some cases.
Talk 2: AGDA+: Proximal Alternating Gradient Descent Ascent Method with a Nonmonotone Adaptive Step-Size Search for Nonconvex Minimax Problems
Speaker: Xuan Zhang
Abstract: We consider double-regularized nonconvex-strongly concave (NCSC) minimax problems of the form (P) : minx maxy g(x) + f (x,y) − h(y), where g, h are closed convex, f is L-smooth in (x, y) and strongly concave in y. We propose a proximal alternating gradient descent ascent method AGDA+ that can adaptively choose nonmonotone primal-dual stepsizes to compute an approximate stationary point for (P) without requiring the knowledge of the global Lipschitz constant L. Using a nonmonotone step-size search (backtracking) scheme, AGDA+ stands out by its ability to exploit the local Lipschitz structure and eliminates the need for precise tuning of hyper-parameters. AGDA+ achieves the optimal iteration complexity of O(ε-2) and it is the first step-size search method for NCSC minimax problems that require only O(1) calls to ∇f per backtracking iteration. We also discuss how AGDA+ can easily be extended to search for μ as well. The numerical experiments demonstrate its robustness and efficiency.
Talk 3: Towards Weaker Variance Assumptions for Stochastic Optimization
Speaker: Ahmet Alacaoglu
Abstract: We revisit a classical assumption for analyzing stochastic gradient algorithms where the squared norm of the stochastic subgradient (or the variance for smooth problems) is allowed to grow as fast as the squared norm of the optimization variable. We contextualize this assumption in view of its inception in the 1960s, its seemingly independent appearance in the recent literature, its relationship to weakest-known variance assumptions for analyzing stochastic gradient algorithms, and its relevance even in deterministic problems for non-Lipschitz nonsmooth convex optimization. We build on and extend a connection recently made between this assumption and the Halpern iteration. For convex nonsmooth, and potentially stochastic, optimization we provide horizon-free algorithms with last-iterate rates. For problems beyond simple constrained optimization, such as convex problems with functional constraints, we obtain rates for optimality measures that do not require boundedness of the feasible set.