Module-wise Training of Residual Networks via the Minimizing Movement Scheme

10/03/2022
by   Skander Karkar, et al.
0

Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introducing a simple module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. The method, which we call TRGL for Transport Regularized Greedy Learning, is particularly well-adapted to residual networks. We study it theoretically, proving that it leads to greedy modules that are regular and that successively solve the task. Experimentally, we show improved accuracy of module-wise trained networks when our regularization is added.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2019

Decoupled Greedy Learning of CNNs

A commonly cited inefficiency of neural network training by back-propaga...
research
10/27/2022

Layer-wise Shared Attention Network on Dynamical System Perspective

Attention networks have successfully boosted accuracy in various vision ...
research
09/06/2017

Neural Networks Regularization Through Class-wise Invariant Representation Learning

Training deep neural networks is known to require a large number of trai...
research
05/28/2019

Greedy InfoMax for Biologically Plausible Self-Supervised Representation Learning

We propose a novel deep learning method for local self-supervised repres...
research
02/06/2019

The role of a layer in deep neural networks: a Gaussian Process perspective

A fundamental question in deep learning concerns the role played by indi...
research
02/25/2020

Exploring Learning Dynamics of DNNs via Layerwise Conditioning Analysis

Conditioning analysis uncovers the landscape of optimization objective b...
research
04/20/2023

Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One

Graph neural networks (GNN) suffer from severe inefficiency. It is mainl...

Please sign up or login with your details

Forgot password? Click here to reset