Transferring Learning Trajectories of Neural Networks

05/23/2023
by   Daiki Chijiwa, et al.
0

Training deep neural networks (DNNs) is computationally expensive, which is problematic especially when performing duplicated training runs, such as model ensemble or knowledge distillation. Once we have trained one DNN on some dataset, we have its learning trajectory (i.e., a sequence of intermediate parameters during training) which may potentially contain useful information for learning the dataset. However, there has been no attempt to utilize such information of a given learning trajectory for another training. In this paper, we formulate the problem of "transferring" a given learning trajectory from one initial parameter to another one, called learning transfer problem, and derive the first algorithm to approximately solve it by matching gradients successively along the trajectory via permutation symmetry. We empirically show that the transferred parameters achieve non-trivial accuracy before any direct training. Also, we analyze the loss landscape property of the transferred parameters, especially from a viewpoint of mode connectivity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2023

A Survey on Recent Teacher-student Learning Studies

Knowledge distillation is a method of transferring the knowledge from a ...
research
03/05/2020

Permute to Train: A New Dimension to Training Deep Neural Networks

We show that Deep Neural Networks (DNNs) can be efficiently trained by p...
research
07/18/2018

Self-supervised Knowledge Distillation Using Singular Value Decomposition

To solve deep neural network (DNN)'s huge training dataset and its high ...
research
10/12/2022

Efficient Knowledge Distillation from Model Checkpoints

Knowledge distillation is an effective approach to learn compact models ...
research
11/04/2020

Channel Planting for Deep Neural Networks using Knowledge Distillation

In recent years, deeper and wider neural networks have shown excellent p...
research
05/07/2019

A Generative Model for Sampling High-Performance and Diverse Weights for Neural Networks

Recent work on mode connectivity in the loss landscape of deep neural ne...
research
10/26/2022

Comparison of neural closure models for discretised PDEs

Neural closure models have recently been proposed as a method for effici...

Please sign up or login with your details

Forgot password? Click here to reset