CWY Parametrization for Scalable Learning of Orthogonal and Stiefel Matrices

04/18/2020
by   Valerii Likhosherstov, et al.
2

In this paper we propose a new approach for optimization over orthogonal groups. We parametrize an orthogonal matrix as a product of Householder reflections. To overcome low parallelization capabilities of computing Householder reflections sequentially, we employ an accumulation scheme called the compact WY (or CWY) transform—a compact matrix representation for the series of Householder reflections which can be computed efficiently on highly parallelizable computation units such as GPU and TPU. We further introduce the Truncated CWY (or T-CWY)—a novel approach for Stiefel manifold parametrization which has a competitive complexity estimate compared to other methods and, again, has an advantage when computed on GPU and TPU. We apply these proposed parametrizations to train recurrent neural network architectures in the tasks of neural machine translation and video prediction and demonstrate superiority in both computational and learning aspects compared to other methods from the literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2017

Orthogonal Recurrent Neural Networks with Scaled Cayley Transform

Recurrent Neural Networks (RNNs) are designed to handle sequential data ...
research
01/24/2019

Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group

We introduce a novel approach to perform first-order optimization with o...
research
09/30/2020

One Reflection Suffice

Orthogonal weight matrices are used in many areas of deep learning. Much...
research
07/29/2016

Recurrent Neural Machine Translation

The vanilla attention-based neural machine translation has achieved prom...
research
02/15/2021

Fast and accurate optimization on the orthogonal manifold without retraction

We consider the problem of minimizing a function over the manifold of or...
research
06/19/2020

An Ode to an ODE

We present a new paradigm for Neural ODE algorithms, calledODEtoODE, whe...

Please sign up or login with your details

Forgot password? Click here to reset