Session: Approximating derivatives in derivative-free optimization
Chair: Geovani Grapiglia
Cluster: Derivative-free Optimization
Talk 1: TRFD: A derivative-free trust-region method based on finite differences for composite nonsmooth optimization
Speaker: Geovani Grapiglia
Abstract: In this work we present TRFD, a derivative-free trust-region method based on finite differences for minimizing composite functions of the form $f(x)=h(F(x))$, where $F$ is a black-box function assumed to have a Lipschitz continuous Jacobian, and $h$ is a known convex Lipschitz function, possibly nonsmooth. The method approximates the Jacobian of $F$ via forward finite differences. We establish an upper bound for the number of evaluations of $F$ that TRFD requires to find an $\epsilon$-approximate stationary point. For L1 and Minimax problems, we show that our complexity bound reduces to $\mathcal{O}(n\epsilon^{-2})$ for specific instances of TRFD, where $n$ is the number of variables of the problem. Assuming that $h$ is monotone and that the components of $F$ are convex, we also establish a worst-case complexity bound, which reduces to $\mathcal{O}(n\epsilon^{-1})$ for Minimax problems. Numerical results are provided to illustrate the relative efficiency of TRFD in comparison with existing derivative-free solvers for composite nonsmooth optimization.
Talk 2: Sparse gradients in derivative-free optimization
Speaker: Daniel McKenzie
Abstract: In this talk I’ll survey some of my recent work in extending DFO to the high-dimensional regime, primarily by exploiting sparsity in gradients to construct good gradient approximations cheaply.
Talk 3: How sampling distribution affect gradient-free optimization
Speaker: Liyuan Cao
Abstract: Gradient-free optimization is often used to optimize functions where only function values are accessible. It primarily refers to gradient descent methods where the "gradients" are directional derivatives estimated via finite differences using randomly sampled points. The sampling distribution is often chosen as Gaussian, uniform on sphere, Haar, or uniform along the coordinate directions. This choice has been shown to affect the algorithms' numerical efficiency. It is presented in this talk a theoretical analysis of the algorithms' performance under different sample distribution. The result provides a guidance on how to choose the distribution.