Session: Data-Driven Optimization with Structured Models
Chair: Krishnakumar Balasubramanian
Cluster: Optimization Under Uncertainty and Data-driven Optimization
Talk 1: Feature Learning in Two-layer Neural Networks under Structured Data
Speaker: Murat Erdogdu
Abstract: We study the effect of gradient-based optimization on feature learning in two-layer neural networks. We consider a setting where the number of samples is of the same order as the input dimension and show that, when the input data is isotropic, gradient descent always improves upon the initial random features model in terms of prediction risk, for a certain class of targets. Further leveraging the practical observation that data often contains additional structure, i.e., the input covariance has non-trivial alignment with the target, we prove that the class of learnable targets can be significantly extended, demonstrating a clear separation between kernel methods and two-layer neural networks in this regime. We additionally consider sparse settings and show that pruning methods can lead to optimal sample complexity.
Talk 2: Control, Transport and Sampling: Towards Better Loss Design
Speaker: Qijia Jiang
Abstract: Leveraging connections between diffusion-based sampling, optimal transport, and stochastic optimal control through their shared links to the Schrodinger bridge problem, we propose novel objective functions that can be used to transport ν to μ, consequently sample from the target μ, via optimally controlled dynamics. We highlight the importance of the pathwise perspective and the role various optimality conditions on the path measure can play for the design of valid training losses, the careful choice of which offer numerical advantages in implementation. Basing the formalism on Schrodinger bridge comes with the additional practical capability of baking in inductive bias when it comes to Neural Network training.
Talk 3: Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models
Speaker: Mladen Kolar
Abstract: In this work, we consider solving optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Sequential Quadratic Programming method to find both first- and second-order stationary points. Our method utilizes a random model to represent the objective function, which is constructed from stochastic observations of the objective and is designed to satisfy proper adaptive accuracy conditions with a high but fixed probability. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a quadratic approximation of the objective subject to a (relaxed) linear approximation of the problem constraints and a trust-region constraint. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature of the reduced Hessian matrix, as well as a second-order correction step to address the potential Maratos effect, which arises due to the nonlinearity of the problem constraints. Such an effect may impede the method from moving away from saddle points. Both gradient and eigen step computations leverage a novel parameter-free decomposition of the step and the trust-region radius, accounting for the proportions among the feasibility residual, optimality residual, and negative curvature. We establish global almost sure first- and second-order convergence guarantees for our method, and present computational results on CUTEst problems, regression problems, and saddle-point problems to demonstrate its superiority over existing line-search-based stochastic methods. Joint work with Yuchen Fang, Sen Na, and Michael W Mahoney.