Session: Approximating derivatives in derivative-free optimization
Chair: Liyuan Cao
Cluster: Derivative-free Optimization
Talk 1: Enhancing finite-difference-based derivative-free optimization with machine learning
Speaker: Timothé Taminiau
Abstract: Derivative-Free Optimization (DFO) involves methods that rely solely on evaluations of the objective function. One of the earliest strategies for designing DFO methods is to adapt first-order methods by replacing gradients with finite-difference approximations. The execution of such methods generates a rich dataset about the objective function, including iterate points, function values, approximate gradients, and successful step sizes. In this work, we propose a simple auxiliary procedure to leverage this dataset and enhance the performance of finite-difference-based DFO methods. Specifically, our procedure trains a surrogate model using the available data and applies the gradient method with Armijo line search to the surrogate until it fails to ensure sufficient decrease in the true objective function, in which case we revert to the original algorithm and improve our surrogate based on the new available information. As a proof of concept, we integrate this procedure with the derivative-free method proposed in (Optim. Lett. 18: 195--213, 2024). Numerical results demonstrate significant performance improvements, particularly when the approximate gradients are also used to train the surrogates.
Talk 2: Sparse gradients in derivative-free optimization
Speaker: Daniel McKenzie
Abstract: In this talk I’ll survey some of my recent work in extending DFO to the high-dimensional regime, primarily by exploiting sparsity in gradients to construct good gradient approximations cheaply.
Talk 3: How sampling distribution affect gradient-free optimization
Speaker: Liyuan Cao
Abstract: Gradient-free optimization is often used to optimize functions where only function values are accessible. It primarily refers to gradient descent methods where the "gradients" are directional derivatives estimated via finite differences using randomly sampled points. The sampling distribution is often chosen as Gaussian, uniform on sphere, Haar, or uniform along the coordinate directions. This choice has been shown to affect the algorithms' numerical efficiency. It is presented in this talk a theoretical analysis of the algorithms' performance under different sample distribution. The result provides a guidance on how to choose the distribution.