Session: Minimax Optimization Algorithms
Chair: Jaeyeon Kim
Cluster: nan
Talk 1: Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements
Speaker: Jiseok Chae
Abstract: In minimax optimization, the extragradient (EG) method outperforms gradient descent-ascent in convex-concave (C-C) problems. Yet, stochastic EG (SEG) has seen limited success in C-C problems, especially for unconstrained cases. In this talk, we discuss the convergence properties of shuffling-based SEG in unconstrained finite-sum minimax problems, motivated by recent progress in shuffling-based stochastic methods. We first show that both random reshuffling and flip-flop shuffling sampling schemes, while successful in minimization problems, each alone can suffer divergence in C-C problems. Then, we show how incorporating a simple "anchoring" trick led to our SEG with flip-flop anchoring (SEG-FFA) method, which successfully converges in C-C problems. We will also present upper and lower bounds in the strongly-convex-strongly-concave setting, demonstrating SEG-FFA's provably faster convergence rate compared to other shuffling-based methods. Reference to the publication(s): https://proceedings.neurips.cc/paper_files/paper/2024/hash/df658fe5097f65485ad80b06e6cb30dd-Abstract-Conference.html
Talk 2: Double-step alternating extragradient with timescale separation for finding local minimax points
Speaker: Kyuwon Kim
Abstract: In minimization, gradient descent converges to a local minimum, and almost surely avoids strict saddle points, under mild conditions. In contrast, minimax optimization lacks such comparable theory for finding local minimax (optimal) points. Recently, the two-timescale extragradient (EG) method has shown potential for finding local minimax points, over the two-timescale gradient descent ascent method. However, it is yet not stable enough for finding any degenerate local minimax points that are prevalent in modern over-parameterized setting. We thus propose to incorporate a new double-step alternating update strategy to further improve the stability of the two-timescale EG method, which remedies the aforementioned issue. Under mild conditions, we show that the proposed method converges to local minimax points that all existing two-timescale methods fail to do so.
Talk 3: H-duality: Duality between Minimizing Gradient Norm and Function Value
Speaker: Jaeyeon Kim
Abstract: In convex optimization, first-order optimization methods efficiently minimizing function values have been a central subject study since Nesterov’s seminal work of 1983. More recently, Kim and Fessler’s OGM-G and Lee et al.’s FISTA-G have been introduced as alternatives that focus on efficiently minimizing the gradient magnitude instead. In this talk, we present H-duality [1], a surprising one-to-one correspondence between methods efficiently minimizing function values and methods efficiently minimizing gradient magnitude. The H-dual of a given first-order method is defined as another first-order method whose coefficients are time-reversed. To the best of our knowledge, H-duality is distinct from Lagrange/Fenchel duality and is distinct from any previously known duality or symmetry relations. H-duality is further extended to the mirror descent setup [2] and fixed-point problems [3]. Using H-duality, we derive new acceleration first-order methods by simply considering the H-dual of existing methods. [1] Jaeyeon Kim, Asuman Ozdaglar, Chanwoo Park, and Ernest Ryu. Time-reversed dissipation induces duality between minimizing gradient norm and function value. Advances in Neural Information Processing Systems, 2023. [2] Jaeyeon Kim, Chanwoo Park, Asuman Ozdaglar, Jelena Diakonikolas, and Ernest K Ryu. Mirror duality in convex optimization. arXiv preprint arXiv:2311.17296, 2023. [3] TaeHo Yoon, Jaeyeon Kim, Jaewook J Suh, and Ernest K Ryu. Optimal acceleration for minimax and fixed-point problems is not unique. International Conference of Machine Learning, 2024.