Session: Advances in Solving Large-Scale Problems: Accelerated Methods and Sharp Analyses (I)
Chair: Sen Na Liwei Jiang
Cluster: Optimization Under Uncertainty and Data-driven Optimization
Talk 1: Implicit Preconditioning in Stochastic Linear System Solvers
Speaker: Michał Dereziński
Abstract: In this talk, I will present new algorithms and convergence guarantees for solving linear systems via sketch-and-project, a framework which unifies many known iterative methods that use randomized sub-sampling and sketching, including randomized Kaczmarz, coordinate descent, and others. Our new results uncover a connection between stochastic iterative solvers and sketching-based randomized preconditioning algorithms: Whenever the spectral structure of a linear system is amenable to constructing a strong preconditioner via low-rank approximation, then one can construct a stochastic solver based on sketch-and-project that will implicitly take advantage of this spectral structure. In particular, I will show how this leads to solving an n x n linear system with at most k large (outlying) singular values in ~O( n^2 + nk^2 ) arithmetic operations, which is faster than the ~O( n^2 k ) cost of constructing a good preconditioner for a deterministic iterative solver such as conjugate gradient.
Talk 2: The radius of statistical efficiency
Speaker: Mateo Díaz
Abstract: Classical results in asymptotic statistics show that the Fisher information matrix controls the difficulty of estimating a statistical model from observed data. In this work, we introduce a companion measure of robustness of an estimation problem: the radius of statistical efficiency (RSE) is the size of the smallest perturbation to the problem data that renders the Fisher information matrix singular. We compute RSE up to numerical constants for a variety of test bed problems, including principal component analysis, generalized linear models, phase retrieval, bilinear sensing, and matrix completion. In all cases, the RSE quantifies the compatibility between the covariance of the population data and the latent model parameter. Interestingly, we observe a precise reciprocal relationship between RSE and the intrinsic complexity/sensitivity of the problem instance, paralleling the classical Eckart–Young theorem in numerical analysis.
Talk 3: Low rank approximation for faster optimization
Speaker: Madeleine Udell
Abstract: Low rank structure is pervasive in real-world datasets. This talk shows how to accelerate the solution of fundamental computational problems, including eigenvalue decomposition, linear system solves, composite convex optimization, and stochastic optimization (including deep learning), by exploiting this low rank structure. We present a simple method based on randomized numerical linear algebra for efficiently computing approximate top eigendecompositions, which can be used to replace large matrices (such as Hessians and constraint matrices) with low rank surrogates that are faster to apply and invert. The resulting solvers for linear systems (NystromPCG), composite convex optimization (NysADMM), and stochastic optimization (SketchySGD and PROMISE) demonstrate strong theoretical and numerical support, outperforming state-of-the-art methods in terms of speed and robustness to hyperparameters.