Session: Bilevel Optimization - Algorithms
Chair: Yuchen Lou
Cluster: nan
Talk 1: A Fully First-Order Method for Stochastic Bilevel Optimization
Speaker: Dohyun Kwon
Abstract: The problem of stochastic bilevel optimization has been the focus of extensive study in recent years. Although many optimization methods have been proposed to address bilevel problems, existing approaches often require potentially expensive calculations involving the Hessians of lower-level objectives. The primary technical challenge lies in tracking the lower-level solutions as upper-level variables change. We introduce a Fully First-order Stochastic Approximation (F2SA) method [1, 2], which only relies on first-order gradient oracles. Additionally, we analyze the complexity of first-order methods under minimal assumptions and provide matching lower bounds [3]. This talk is based on joint work with Jeongyeol Kwon, Hanbaek Lyu, Stephen Wright, and Robert Nowak (UW-Madison). [1] Kwon, J., Kwon, D., Wright, S., & Nowak, R. D. (2023). A fully first-order method for stochastic bilevel optimization. In Proceedings of the 40th International Conference on Machine Learning (pp. 18083-18113). PMLR. (ICML 2023, Oral) [2] Kwon, J., Kwon, D., Wright, S., & Nowak, R. D. (2024). On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation. In The Twelfth International Conference on Learning Representations. (ICLR 2024, Spotlight) [3] Kwon, J., Kwon, D., & Lyu, H. (2024). On the complexity of first-order methods in stochastic bilevel optimization. In Proceedings of the 41st International Conference on Machine Learning (pp. 25784-25811). PMLR. (ICML 2024)
Talk 2: Scalable Subspace Minimization for Parameter Efficient Fine-Tuning in LLM
Speaker: Yuchen Lou
Abstract: Given the growing sizes of language models, there is a pressing need for memory-efficient optimizers that enable scalable training and fine-tuning. A common strategy is to restrict updates to a small subset of parameters, significantly reducing memory usage. We propose a unified framework for such optimizers, specifically designed for Parameter-Efficient Fine-Tuning (PEFT). Grounded in classical subspace minimization, our approach achieves superior memory and computational efficiency compared to existing PEFT methods such as LoRA and GaLore. Crucially, it provides theoretical convergence guarantees---an element largely missing in current PEFT literature due to challenges arising from degenerate updates. Drawing on insights into the intrinsic dimensionality of fine-tuning, we introduce a novel algorithm that integrates well-established subspace optimization techniques with streaming PCA and a low-rank gradient restart mechanism. Our method matches state-of-the-art performance while significantly reducing memory overhead.
Talk 3: A gradient-based method for bilevel optimization with convex lower-level problems
Speaker: Mattia Solla Saez
Abstract: This talk presents an algorithm to compute approximate solutions to a class of bilevel optimization problems. Recent advances in variational analysis have shown that the solution mapping of certain generalized equations is continuously differentiable at their nondegenerate solutions. We use this result to construct a sequence of approximate solutions of an appropriate regularization of the problem that converges to a feasible point of the original problem. Under an additional second-order assumption, the sequence is shown to converge to a stationary point of the original problem. Our framework allows for a wide class of nonsmooth problems in the lower level, including (convex) nonlinear programming problems and (convex) minimax problems. We show that, in these cases, the nondegeneracy assumption reduces to known conditions like strict complementarity, and discuss how to deal with the absence of this assumption algorithmically. This is joint work with Dr. Johannes Royset (royset@usc.edu).