Simultaneous Training of Partially Masked Neural Networks

06/16/2021
by   Amirkeivan Mohtashami, et al.
0

For deploying deep learning models to lower end devices, it is necessary to train less resource-demanding variants of state-of-the-art architectures. This does not eliminate the need for more expensive models as they have a higher performance. In order to avoid training two separate models, we show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance. We extend on prior methods that focused only on core networks of smaller width, while we focus on supporting arbitrary core network architectures. Our proposed training scheme switches consecutively between optimizing only the core part of the network and the full one. The accuracy of the full model remains comparable, while the core network achieves better performance than when it is trained in isolation. In particular, we show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone. We analyze our training scheme theoretically, and show its convergence under assumptions that are either standard or practically justified. Moreover, we show that the developed theoretical framework allows analyzing many other partial training schemes for neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

Learning Low-Rank Approximation for CNNs

Low-rank approximation is an effective model compression technique to no...
research
07/11/2023

Stack More Layers Differently: High-Rank Training Through Low-Rank Updates

Despite the dominance and effectiveness of scaling, resulting in large n...
research
09/27/2022

Exploring Low Rank Training of Deep Neural Networks

Training deep neural networks in low rank, i.e. with factorised layers, ...
research
05/04/2023

Cuttlefish: Low-Rank Model Training without All the Tuning

Recent research has shown that training low-rank neural networks can eff...
research
06/12/2019

Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian

Modern neural network architectures often generalize well despite contai...
research
03/10/2022

projUNN: efficient method for training deep networks with unitary matrices

In learning with recurrent or very deep feed-forward networks, employing...
research
06/23/2020

Principal Component Networks: Parameter Reduction Early in Training

Recent works show that overparameterized networks contain small subnetwo...

Please sign up or login with your details

Forgot password? Click here to reset