projUNN: efficient method for training deep networks with unitary matrices

by   Bobak Kiani, et al.

In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-k updates – or their rank-k approximation – that maintains performance at a nearly optimal training runtime. We introduce two variants of this method, named Direct (projUNN-D) and Tangent (projUNN-T) projected Unitary Neural Networks, that can parameterize full N-dimensional unitary or orthogonal matrices with a training runtime scaling as O(kN^2). Our method either projects low-rank gradients onto the closest unitary matrix (projUNN-T) or transports unitary matrices in the direction of the low-rank gradient (projUNN-D). Even in the fastest setting (k=1), projUNN is able to train a model's unitary parameters to reach comparable performances against baseline implementations. By integrating our projUNN algorithm into both recurrent and convolutional neural networks, our models can closely match or exceed benchmarked results from state-of-the-art algorithms.


Algorithms for Efficiently Learning Low-Rank Neural Networks

We study algorithms for learning low-rank neural networks – networks whe...

InRank: Incremental Low-Rank Learning

The theory of greedy low-rank learning (GLRL) aims to explain the impres...

Stack More Layers Differently: High-Rank Training Through Low-Rank Updates

Despite the dominance and effectiveness of scaling, resulting in large n...

Scaling Private Deep Learning with Low-Rank and Sparse Gradients

Applying Differentially Private Stochastic Gradient Descent (DPSGD) to t...

Low-rank Gradient Approximation For Memory-Efficient On-device Training of Deep Neural Network

Training machine learning models on mobile devices has the potential of ...

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Overparameterized neural networks generalize well but are expensive to t...

Simultaneous Training of Partially Masked Neural Networks

For deploying deep learning models to lower end devices, it is necessary...

Please sign up or login with your details

Forgot password? Click here to reset