Session: Machine Learning - Privacy and Learning Theory
Chair: Devansh Gupta
Cluster: nan
Talk 1: Inherent Privacy of Zeroth Order Projected Gradient Descent
Speaker: Devansh Gupta
Abstract: Differentially private zeroth-order optimization methods have recently gained popularity in private fine tuning of machine learning models due to their reduced memory requirements. Current approaches for privatizing zeroth-order methods rely on adding Gaussian noise to the estimated zeroth-order gradients. However, since the search direction in the zeroth-order methods is inherently random, researchers including Tang et. Al [1] and Zhang et. Al [2] have raised an important question: is the inherent noise in zeroth-order estimators sufficient to ensure the overall differential privacy of the algorithm? This work settles this question for a class of oracle-based optimization algorithms where the oracle returns zeroth-order gradient estimates. In particular, we show that for a fixed initialization, there exist strongly convex objective functions such that running (Projected) Zeroth-Order Gradient Descent (ZO-GD) is not differentially private. Furthermore, we show that even with random initialization and without revealing intermediate iterates, the privacy loss in ZO-GD can grow superlinearly with the number of iterations when minimizing convex objective functions. [1] Tang, X., Panda, A., Nasr, M., Mahloujifar, S., and Mittal, P. (2024). Private fine-tuning of large language models with zeroth-order optimization. [2] Zhang, L., Li, B., Thekumparampil, K. K., Oh, S., and He, N. (2024a). DPZero: Private fine-tuning of language models without backpropagation. In International Conference on Machine Learning. PMLR.
Talk 2: Near-Optimal and Tractable Estimation under Shift-Invariance
Speaker: Dmitrii Ostrovskii
Abstract: In 1990s, Arkadi Nemirovski asked the following question: How hard is it to estimate a sequence of length N satisfying an unknown linear recurrence relation of order S and observed in i.i.d. Gaussian noise? The class of all such sequences is parametric but extremely rich: it contains all exponential polynomials with total degree S, including harmonic oscillations with s arbitrary frequencies. Geometrically, this class corresponds to the projection onto R^N of the union of all shift-invariant subspaces of R^Z of dimension S. In this work, we show that the statistical complexity of this class, as measured by the squared minimax radius of the (1−P)-confidence Euclidean ball, is nearly the same as for the class of S-sparse signals, namely (S\log(N) + \log(1/P)) \log^2(S) \log(N/S) up to a constant factor. Moreover, the corresponding near-minimax estimator is tractable, and it can be used to build a test statistic with a near-minimax detection threshold in the associated detection problem. These statistical results rest upon an approximation-theoretic one: we show that finite-dimensional shift-invariant subspaces admit compactly supported reproducing kernels whose Fourier spectra have the smallest possible p-norms, simultaneously for all p >= 1.
Talk 3: On stochastic mirror descent under relative nonconvexity and nonsmoothness
Speaker: Philip Thompson
Abstract: In this talk, we review recent convergence analysis of the stochastic mirror descent method and present novel convergence analysis within the framework of relative variation (e.g. Lipschitness, smoothness, etc) and relative notions of the PL inequality. We also present some applications in machine learning.