Session: Nonsmooth Optimization - Theory and Algorithms
Chair: Shaoning Han
Cluster: nan
Talk 1: Analysis of a Class of Heaviside Composite Minimization Problems
Speaker: Shaoning Han
Abstract: The minimization of nonclosed functions is a difficult topic that has been minimally studied. Among such functions is a Heaviside composite function that is the composition of a Heaviside function with a possibly nonsmooth multivariate function. Unifying a statistical estimation problem with hierarchical selection of variables and a sample average approximation of composite chance constrained stochastic programs, a Heaviside composite optimization problem is one whose objective and constraints are defined by sums of possibly nonlinear multiples of such composite functions. In this talk, we first present difficulties of tackling such problems using existing optimization techniques. Then we establish stationarity conditions for the problem, and introduce a sufficient condition, termed local convex-like property, under which the proposed stationary point is a local minimizer. Finally, we briefly discuss solution strategies via lifting and reformulation techniques. For details of this work, please refer to "Han S, Cui Y, Pang J-S (2024) Analysis of a class of minimization problems lacking lower semicontinuity. Mathematics of Operations Research."
Talk 2: Non-smooth stochastic gradient descent using smoothing functions
Speaker: Jingfu Tan
Abstract: In this talk, we address stochastic optimization problems involving a composition of a non-smooth outer function and a smooth inner function, a formulation frequently encountered in machine learning and operations research. To deal with the non-differentiability of the outer function, we approximate the original non-smooth function using smoothing functions, which are continuously differentiable and approach the original function as a smoothing parameter goes to zero (at the price of increasingly higher Lipschitz constants). The proposed smoothing stochastic gradient method iteratively drives the smoothing parameter to zero at a designated rate. We establish convergence guarantees under strongly convex, convex, and nonconvex settings, proving convergence rates that match known results for non-smooth stochastic compositional optimization. In particular, for convex objectives, smoothing stochastic gradient achieves a~$1/T^{1/4}$ rate in terms of the number of stochastic gradient evaluations. We further show how the general compositional and finite sum compositional problems (widely used frameworks in large-scale machine learning and risk-averse optimization) fit the assumptions needed for the rates (unbiased gradient estimates, bounded second moments, and accurate smoothing errors). We will present numerical results indicating that smoothing stochastic gradient descent is a competitive method for certain classes of problems.
Talk 3: (Canceled) Convergence analysis of the 9th Chebyshev Method for Nonconvex, Nonsmooth Optimization Problems
Speaker: Jiabao Yang
Abstract: Ushiyama-Sato-Matsuo (2022) (hereafter referred to as [USM]) showed that some optimization methods can be regarded as numerical analysis methods for ordinary differential equations. [USM] proposed the 9th Chebyshev method, an optimization method that allows a larger step width than the steepest descent method, under the assumption that the objective function is convex and differentiable. Observing the left edge of the stability domain, we see the 9th Chebyshev method should allow about 78 times larger steps than the steepest descent method. On the other hand, the 9th Chebyshev method requires 9 evaluations of the gradient per iteration, while the steepest descent method requires only one. Considering most calculation time is spent on the gradient, we expect that the 9th Chebyshev method converges 78/9 = 8.7 times faster. However, there are many cases in the world where differentiable, smooth, convex, etc. cannot be used. For example, problems in image analysis and voice recognition require the use of nondifferentiable, nonsmooth, and nonconvex functions. The steepest descent method or the 9th Chebyshev method proposed by [USM] assumes that the objective function is convex and differentiable, and thus cannot be applied to problems with an objective function that is not necessarily differentiable. For nonconvex optimization problems, finding a good local optimal solution is a realistic goal, and Riis-Ehrhardt-Quispel-Scholieb (2022) (hereafter referred to as [REQS]) introduced an alternative concept of differentiation called “Clarke subdifferential” for functions that are not always differentiable, and proposed an optimization method that can be applied to objective functions that are not necessarily differentiable and convex. Although the objective function is not assumed to be differentiable or convex, such as being locally Lipschitz continuous, [REQS] prove that the optimization method converges to the set of Clarke stationary points defined in the framework of Clarke subdifferential under the condition that some assumptions are satisfied. In this talk, based on the proof method of [REQS], we propose a method that combines the method of [REQS] and the method of [USM] and prove that the proposed method converges to the set of Clarke stationary points under the same objective function assumption as [REQS]. Refernces: 1, Kansei Ushiyama, Shun Sato, Takayasu Matsuo, "Deriving efficient optimization methods based on stable explicit numerical methods", JSIAM Letters Vol.14 (2022) pp.29–32 2, Erlend S.Riis, Matthias J.Ehrhardt, G.R.W.Quispel, Carola-Bibiane Schonlieb, "A Geometric Integration Approach to Nonsmooth, Nonconvex Optimisation", Foundations of Computational Mathematics (2022) 22:1351–1394