Session: Frontiers of Optimization for Machine Learning - Part I
Chair: Fred Roosta
Cluster: Nonlinear Optimization
Talk 1: A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods
Speaker: Andre Milzarek
Abstract: In this talk, we discuss a novel analysis framework for non-descent-type optimization methodologies in nonconvex scenarios based on the Kurdyka-Lojasiewicz property. The proposed framework allows covering a broad class of algorithms, including those commonly employed in stochastic and distributed optimization. Specifically, it enables the asymptotic analysis of first-order methods that lack a sufficient descent property and do not require access to full (deterministic) gradient information. We leverage this framework to establish, for the first time, iterate convergence and corresponding rates of convergence for the decentralized gradient method and federated averaging under mild assumptions. Additional applications of the developed analysis techniques and new convergence results will be shown (e.g., for the random reshuffling and the stochastic gradient descent method) to illustrate generality of the approach.
Talk 2: Rockafellian Relaxation and Stochastic Optimization Under Perturbations
Speaker: Johannes Royset
Abstract: In practice, optimization models are often prone to unavoidable inaccuracies because of dubious assumptions and corrupted data. Traditionally, this placed special emphasis on risk-based and robust formulations, and their focus on “conservative” decisions. We develop, in contrast, an “optimistic” framework based on Rockafellian relaxations in which optimization is conducted not only over the original decision space but also jointly with a choice of model perturbation. The framework enables us to address challenging problems with ambiguous probability distributions from the areas of two-stage stochastic optimization without relatively complete recourse, probability functions lacking continuity properties, expectation constraints, and outlier analysis. We are also able to circumvent the fundamental difficulty in stochastic optimization that convergence of distributions fails to guarantee convergence of expectations. The framework centers on the novel concepts of exact and limit-exact Rockafellians, with interpretations of “negative” regularization emerging in certain settings. We illustrate the role of Phi-divergence, examine rates of convergence under changing distributions, and explore extensions to first-order optimality conditions. The main development is free of assumptions about convexity, smoothness, and even continuity of objective functions. Numerical results in the setting of computer vision and text analytics with label noise illustrate the framework.
Talk 3: Convergence analyses under computable models of nonsmoothness
Speaker: Vivak Patel
Abstract: In this talk, we discuss limitations of popular approaches to automatic differentiation for nonsmooth functions, and the issues that they can create for the reliability of gradient-based methods. We then discuss correct ways of computing differentials for nonsmooth functions, and we analyze the behavior of gradient-based methods under these corrected differentials.