GradMax: Growing Neural Networks using Gradient Information

01/13/2022
by   Utku Evci, et al.
0

The architecture and the parameters of neural networks are often optimized independently, which requires costly retraining of the parameters whenever the architecture is modified. In this work we instead focus on growing the architecture without requiring costly retraining. We present a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics. We achieve the latter by maximizing the gradients of the new weights and find the optimal initialization efficiently by means of the singular value decomposition (SVD). We call this technique Gradient Maximizing Growth (GradMax) and demonstrate its effectiveness in variety of vision tasks and architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2022

Fast optimization of common basis for matrix set through Common Singular Value Decomposition

SVD (singular value decomposition) is one of the basic tools of machine ...
research
02/16/2023

Singular Value Representation: A New Graph Perspective On Neural Networks

We introduce the Singular Value Representation (SVR), a new method to re...
research
06/11/2018

ATOMO: Communication-efficient Learning via Atomic Sparsification

Distributed model training suffers from communication overheads due to f...
research
12/07/2020

A Singular Value Perspective on Model Robustness

Convolutional Neural Networks (CNNs) have made significant progress on s...
research
01/25/2022

Efficient Approximations of the Fisher Matrix in Neural Networks using Kronecker Product Singular Value Decomposition

Several studies have shown the ability of natural gradient descent to mi...
research
12/28/2022

Breaking the Architecture Barrier: A Method for Efficient Knowledge Transfer Across Networks

Transfer learning is a popular technique for improving the performance o...
research
06/22/2023

Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation

We develop an approach to efficiently grow neural networks, within which...

Please sign up or login with your details

Forgot password? Click here to reset