Name: Parallel Sessions 12H: Neural Networks and Optimization
Start: 2025-07-24T16:15:00-0700
End: 2025-07-24T17:30:00-0700

Thursday July 24, 2025 4:15pm - 5:30pm PDT

Taper Hall (THH) 116

Session: Neural Networks and Optimization
Chair: Yiping Lu
Cluster: nan

Talk 1: Towards Quantifying the Hessian structure of Neural Networks
Speaker: Yushun Zhang
Abstract: Empirical studies reported that the Hessian of neural networks exhibits a near-block-diagonal structure, yet its theoretical foundation remains unclear. In this work, we provide a rigorous theoretical analysis of the Hessian structure of NNs at random initialization. We study linear models and 1-hidden-layer networks with the mean-square (MSE) loss and the Cross-Entropy (CE) loss for classification tasks. By leveraging random matrix theory, we compare the limit distributions of the diagonal and off-diagonal Hessian blocks and find that the block-diagonal structure arises as $C \rightarrow \infty$, where $C$ denotes the number of classes. Our findings reveal that $C$ is a primary driver of the block-diagonal structure. These results may shed new light on the Hessian structure of large language models (LLMs), which typically operate with a large $C$, often exceeding $10^4$ or $10^5$.

Talk 2: Multiscale Behavior of Gradient Descent at the Edge of Stability: Central Flow as a Boundary Layer
Speaker: Yiping Lu
Abstract: Understanding optimization in deep learning is challenging due to complex oscillatory dynamics known as the “edge of stability.” In this regime, gradient flow no longer serves as an accurate proxy for gradient descent. In this talk, we adopt a fast-slow differential equation approach to characterize both the oscillatory dynamics and the self-stabilizing behavior of gradient descent when operating at a large learning rate. Using singular perturbation theory, we describe the behavior near stationary manifolds as a boundary layer—analogous to the thin layer of fluid flowing immediately adjacent to a bounding surface. This boundary layer approximation captures the essential dynamics of gradient descent at the edge of stability.

Talk 3: Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Structured Neural Network
Speaker: Ching-pei Lee
Abstract: We propose a Regularized Adaptive Momentum Dual Averaging (RAMDA) algorithm for training neural networks with a regularization term for promoting desired structures. Similar to existing regularized adaptive methods that adopt coordinate-wise scaling, the subproblem for computing the update direction of RAMDA involves a nonsmooth regularizer and a diagonal preconditioner, and therefore does not possess a closed-form solution in general. We thus also carefully devise an implementable inexactness condition that retains convergence guarantees similar to the exact versions, and show that this proposed condition can be quickly satisfied by applying standard proximal gradient to the subproblem of RAMDA. We show asymptotic variance reduction for RAMDA, and further leverage the theory of manifold identification to prove that, even in the presence of such inexactness, after a finite number of steps, the iterates of RAMDA attain the ideal structure induced by the partly smooth regularizer at the stationary point of asymptotic convergence. This structure is locally optimal near the point of convergence, making RAMDA the first regularized adaptive method outputting models that are locally optimally structured. Extensive numerical experiments in training state-of-the-art modern neural network models in computer vision, language modeling, and speech tasks show that the proposed RAMDA is efficient and consistently outperforms state of the art for training structured neural network in terms of both the structuredness and the predictive power of the model. This is a joint work with Zih-Syuan Huang.

Speakers

Yushun Zhang

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →

Yiping Lu

Ching-pei Lee

Thursday July 24, 2025 4:15pm - 5:30pm PDT
Taper Hall (THH) 116 3501 Trousdale Pkwy, 116, Los Angeles, CA 90089

Parallel Session

ICCOPT2025USC

Yushun Zhang

Yiping Lu

Ching-pei Lee

Log in to save this to your schedule, view media, leave feedback and see who's attending!