Session: Recent advances in first-order methods
Chair: Lexiao Lai
Cluster: Nonlinear Optimization
Talk 1: Heavy-ball ODE converges at rate $O(T^{-4/7})$ with averaging for nonconvex functions
Speaker: Naoki Marumo
Abstract: First-order optimization methods for nonconvex functions with Lipschitz continuous gradients and Hessian have been studied intensively. State-of-the-art methods finding an $\varepsilon$-stationary point within $O(\varepsilon^{-{7/4}})$ or $\tilde{O}(\varepsilon^{-{7/4}})$ gradient evaluations are based on Nesterov's accelerated gradient descent (AGD) or Polyak's heavy-ball method. However, these algorithms employ additional mechanisms, such as restart schemes and negative curvature exploitation, which complicate the algorithms' behavior and make it challenging to apply them to more advanced settings (e.g., stochastic optimization). To realize a simpler algorithm, we investigate the heavy-ball differential equation, a continuous-time analogy of the AGD and heavy-ball methods; we prove that the dynamics attains an $\varepsilon$-stationary point within $O(\varepsilon^{-{7/4}})$ time.
Talk 2: On Optimal Smoothings of Convex Functions and Sets
Speaker: Thabo Samakhoana
Abstract: A nonsmooth objective function can be optimized at an improved rate by optimizing its smooth approximation at an accelerated rate. Given that better smoothings lead to improved convergence guarantees, it is natural to seek the best possible smoothing. In this work, we provide a theoretical study of optimal smoothings of functions and sets. In particular, we characterize the set of optimal smoothings of sublinear functions and cones. We also provide a framework of extending conic smoothings to functions and conic sections.
Talk 3: Inexact subgradient methods for semialgebraic functions
Speaker: Tam Le
Abstract: Motivated by the widespread use of approximate derivatives in machine learning and optimization, we study inexact subgradient methods with non-vanishing additive errors and step sizes. In the nonconvex semialgebraic setting, under boundedness assumptions, we prove that the method provides points that eventually fluctuate close to the critical set at a distance proportional to ρ where ρ is the error in subgradient evaluation and ρ relates to the geometry of the problem. In the convex setting, we provide complexity results for the averaged values. We also obtain byproducts of independent interest, such as descent-like lemmas for nonsmooth nonconvex problems and some results on the limit of affine interpolants of differential inclusions.