Session: New Riemannian Optimization Applications
Chair: Jiang Hu
Cluster: Optimization on Manifolds
Talk 1: Retraction-free optimization over the Stiefel manifold with application to the LoRA fine-tuning
Speaker: Jiang Hu
Abstract: Optimization over the Stiefel manifold has played a significant role in various machine learning tasks. Many existing algorithms either use the retraction operator to keep each iterate staying on the manifold, or solve an unconstrained quadratic penalized problem. The retraction operator in the former corresponds to orthonormalization of matrices and can be computationally costly for large-scale matrices. The latter approach usually equips with an unknown large penalty parameter. To address the above issues, we propose a retraction-free and penalty parameter-free algorithm, which lands on the manifold. Moreover, our convergence theory allows using constant step size, which improve the result of converging to a neighborhood in \citep{ablin2022fast}. A key component of the analysis is the convex-like property of the quadratic penalty of the Stiefel manifold, which enables us to explicitly characterize the constant penalty parameter. As an application, we introduce a new algorithm, Manifold-LoRA, which employs the landing technique and a carefully designed step size strategy to accelerate low-rank adaptation (LoRA) in fine-tuning large language models. Numerical experiments on the benchmark datasets demonstrate the efficiency of our proposed method.
Talk 2: Optimal Tensor Network Disentanglement via Manifold Optimization
Speaker: Chao Yang
Abstract: A tensor network can be disentangled by performing a unitary gauge transformation within the network to allow the transformed network to be approximated by a low rank decomposition. Seeking an unitary transformation to minimize the truncation error is equivalent to solving a constrained optimization problem in which the optimal solution of the problem lies on a Stiefel manifold. We describe the objective function for achieving disentanglement and show how the problem can be solved by a Riemannian Newton's method. We also discuss practical issues such as the choice of a starting guess, the stopping criterion and how the gradient and Hessian can be computed efficiently.
Talk 3: A projected semismooth Newton method for a class of nonconvex composite programs with strong prox-regularity
Speaker: Jiayuan Wu
Abstract: This paper aims to develop a Newton-type method to solve a class of nonconvex composite programs. In particular, the nonsmooth part is possibly nonconvex. To tackle the nonconvexity, we develop a notion of strong prox-regularity which is related to the singleton property and Lipschitz continuity of the associated proximal operator, and we verify it in various classes of functions, including weakly convex functions, indicator functions of proximally smooth sets, and two specific sphere-related nonconvex nonsmooth functions. In this case, the problem class we are concerned with covers smooth optimization problems on manifold and certain composite optimization problems on manifold. For the latter, the proposed algorithm is the first second-order type method. Combining with the semismoothness of the proximal operator, we design a projected semismooth Newton method to find a root of the natural residual induced by the proximal gradient method. Due to the possible nonconvexity of the feasible domain, an extra projection is added to the usual semismooth Newton step and new criteria are proposed for the switching between the projected semismooth Newton step and the proximal step. The global convergence is then established under the strong prox-regularity. Based on the BD regularity condition, we establish local superlinear convergence. Numerical experiments demonstrate the effectiveness of our proposed method compared with state-of-the-art ones.