Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform

02/04/2020
by   Jun Li, et al.
12

Strictly enforcing orthonormality constraints on parameter matrices has been shown advantageous in deep learning. This amounts to Riemannian optimization on the Stiefel manifold, which, however, is computationally expensive. To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. We specify two new optimization algorithms: Cayley SGD with momentum, and Cayley ADAM on the Stiefel manifold. Convergence of Cayley SGD is theoretically analyzed. Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches that enforce orthonormality of CNN parameters; and (b) Achieve faster convergence rates than the baseline SGD and ADAM algorithms without compromising the performance of the CNN. Cayley SGD and Cayley ADAM are also shown to reduce the training time for optimizing the unitary transition matrices in RNNs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2020

Geoopt: Riemannian Optimization in PyTorch

Geoopt is a research-oriented modular open-source package for Riemannian...
research
07/25/2019

DEAM: Accumulated Momentum with Discriminative Weight for Stochastic Optimization

Optimization algorithms with momentum, e.g., Nesterov Accelerated Gradie...
research
02/20/2023

Simplifying Momentum-based Riemannian Submanifold Optimization

Riemannian submanifold optimization with momentum is computationally cha...
research
03/29/2023

Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints

Orthogonality constraints naturally appear in many machine learning prob...
research
10/28/2022

Flatter, faster: scaling momentum for optimal speedup of SGD

Commonly used optimization algorithms often show a trade-off between goo...
research
04/09/2023

μ^2-SGD: Stable Stochastic Optimization via a Double Momentum Mechanism

We consider stochastic convex optimization problems where the objective ...

Please sign up or login with your details

Forgot password? Click here to reset