Session: Advances in min-max optimization algorithms for machine learning
Chair: Ahmet Alacaoglu
Cluster: Optimization For Data Science
Talk 1: How to make the gradient descent-ascent converge to local minimax optima
Speaker: Donghwan Kim
Abstract: Can we effectively train a generative adversarial network (GAN) (or equivalently, optimize a minimax problem), similarly to how we successfully train a classification neural network (or equivalently, minimize a function) using gradient methods? Currently, the answer is 'No'. The remarkable success of gradient descent in minimization is supported by theoretical results; under mild conditions, gradient descent converges to a local minimizer, and almost surely avoids strict saddle points. However, comparable theoretical support for minimax optimization is currently lacking. This talk will discuss recent progress in addressing this gap using dynamical systems theory. Specifically, this talk will present new variants of gradient descent-ascent that, under mild conditions, converge to local minimax optima, where the existing gradient descent-ascent methods fail to do so.
Talk 2: Parameter-free second-order methods for min-max optimization
Speaker: Ali Kavis
Abstract: In this talk, I will talk about an adaptive, line-search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line-search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic method and appropriately combine it with second-order information. Moreover, distinct from common adaptive schemes, we define the step size recursively as a function of the gradient norm and the prediction error in the optimistic update. We first analyze a variant where the step size requires knowledge of the Lipschitz constant of the Hessian. Under the additional assumption of Lipschitz continuous gradients, we further design a parameter-free version by tracking the Hessian Lipschitz constant locally and ensuring the iterates remain bounded. We also evaluate the practical performance of our algorithm by comparing it to existing second-order algorithms for minimax optimization. Inspired by the adaptive design of the step size, we propose a heuristic initialization rule that performs competitively across different problems and scenarios and eliminates the need to fine tune the step size.
Talk 3: TBD
Speaker: Jelena Diakonikolas
Abstract: TBD