Session: Nonconvex optimization with applications in high-dimensional statistics
Chair: Andrew McRae
Cluster: Nonlinear Optimization
Talk 1: Near optimal sample complexity for the matrix and tensor normal model via geodesic convexity
Speaker: Akshay Ramachandran
Abstract: The matrix normal model, the family of Gaussian matrix-variate distributions whose covariance matrix is the Kronecker product of two lower dimensional factors, is frequently used to model matrix-variate data. The tensor normal model generalizes this family to Kronecker products of three or more factors. We study the estimation of the Kronecker factors of the covariance matrix in the matrix and tensor models. We show optimal nonasymptotic bounds for the error achieved by the maximum likelihood estimator (MLE) in several natural metrics. In contrast to existing bounds, our results do not rely on the factors being well-conditioned or sparse. For the matrix normal model, all our bounds are minimax optimal up to logarithmic factors, and for the tensor normal model our bound for the largest factor and overall covariance matrix are minimax optimal up to constant factors provided there are enough samples for any estimator to obtain constant Frobenius error. In the same regimes as our sample complexity bounds, we show that an iterative procedure to compute the MLE known as the flip-flop algorithm converges linearly with high probability. Our main tool is geodesic strong convexity in the geometry on positive-definite matrices induced by the Fisher information metric. This strong convexity is determined by the expansion of certain random quantum channels.
Talk 2: Nonconvex landscapes for phase retrieval
Speaker: Andrew McRae
Abstract: In phase retrieval, we want to recover a signal from the magnitudes of linear measurements. The nonlinearity of the absolute value makes a simple least-squares estimator a nonconvex optimization problem. Nevertheless, empirically, nonconvex methods work well, and there is some limited theory to explain this when the measurements are made randomly. I will present some general deterministic results on when a common smooth but quartic nonconvex formulation has a benign landscape: that is, the only local optimum is, in fact, the true signal we want to recover. This recovers existing landscape results with a simpler and more general analysis. Furthermore, even when this result fails, I show that we can often overcome this and still have a benign landscape by (slightly) relaxing the problem. This brings the benefits of the convex semidefinite PhaseLift relaxation while maintaining a far smaller optimization problem.
Talk 3: Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth
Speaker: Liwei Jiang
Abstract: A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, we argue that this belief is inaccurate. We show that gradient descent with an adaptive stepsize converges at a local (nearly) linear rate on any smooth function that merely exhibits fourth-order growth away from its minimizer. The adaptive stepsize we propose arises from an intriguing decomposition theorem: any such function admits a smooth manifold around the optimal solution—which we call the ravine—so that the function grows at least quadratically away from the ravine and has constant order growth along it. The ravine allows one to interlace many short gradient steps with a single long Polyak gradient step, which together ensure rapid convergence to the minimizer. We illustrate the theory and algorithm on the problems of matrix sensing and factorization and learning a single neuron in the overparameterized regime.