Stack More Layers Differently: High-Rank Training Through Low-Rank Updates

07/11/2023
by   Vladislav Lialin, et al.
0

Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparametrized models remains poorly understood, and alternative approaches do not necessarily make it cheaper to train high-performance models. In this paper, we explore low-rank training techniques as an alternative approach to training large neural networks. We introduce a novel method called ReLoRA, which utilizes low-rank updates to train high-rank networks. We apply ReLoRA to pre-training transformer language models with up to 350M parameters and demonstrate comparable performance to regular neural network training. Furthermore, we observe that the efficiency of ReLoRA increases with model size, making it a promising approach for training multi-billion-parameter networks efficiently. Our findings shed light on the potential of low-rank training techniques and their implications for scaling laws.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2023

FLuRKA: Fast fused Low-Rank Kernel Attention

Many efficient approximate self-attention techniques have become prevale...
research
09/27/2022

Exploring Low Rank Training of Deep Neural Networks

Training deep neural networks in low rank, i.e. with factorised layers, ...
research
06/16/2021

Simultaneous Training of Partially Masked Neural Networks

For deploying deep learning models to lower end devices, it is necessary...
research
02/17/2020

Low-Rank Bottleneck in Multi-head Attention Models

Attention based Transformer architecture has enabled significant advance...
research
03/10/2022

projUNN: efficient method for training deep networks with unitary matrices

In learning with recurrent or very deep feed-forward networks, employing...
research
08/24/2021

Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation

In this thesis, we introduce Greenformers, a collection of model efficie...
research
05/30/2023

Rank-adaptive spectral pruning of convolutional layers during training

The computing cost and memory demand of deep learning pipelines have gro...

Please sign up or login with your details

Forgot password? Click here to reset