On layer-level control of DNN training and its impact on generalization

06/05/2018
by   Simon Carbonnelle, et al.
0

The generalization ability of a neural network depends on the optimization procedure used for training it. For practitioners and theoreticians, it is essential to identify which properties of the optimization procedure influence generalization. In this paper, we observe that prioritizing the training of distinct layers in a network significantly impacts its generalization ability, sometimes causing differences of up to 30 monitor and control such prioritization, we propose to define layer-level training speed as the rotation rate of the layer's weight vector (denoted by layer rotation rate hereafter), and develop Layca, an optimization algorithm that enables direct control over it through each layer's learning rate parameter, without being affected by gradient propagation phenomena (e.g. vanishing gradients). We show that controlling layer rotation rates enables Layca to significantly outperform SGD with the same amount of learning rate tuning on three different tasks (up to 10 Furthermore, we provide experiments that suggest that several intriguing observations related to the training of deep models, i.e. the presence of plateaus in learning curves, the impact of weight decay, and the bad generalization properties of adaptive gradient methods, are all due to specific configurations of layer rotation rates. Overall, our work reveals that layer rotation rates are an important factor for generalization, and that monitoring it should be a key component of any deep learning experiment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2018

Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation

Stochastic gradient descent (SGD) has achieved great success in training...
research
10/15/2015

Layer-Specific Adaptive Learning Rates for Deep Networks

The increasing complexity of deep learning architectures is resulting in...
research
11/25/2020

Implicit bias of deep linear networks in the large learning rate phase

Correctly choosing a learning rate (scheme) for gradient-based optimizat...
research
03/29/2021

FixNorm: Dissecting Weight Decay for Training Deep Neural Networks

Weight decay is a widely used technique for training Deep Neural Network...
research
10/06/2020

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popula...
research
01/10/2020

Tangent-Space Gradient Optimization of Tensor Network for Machine Learning

The gradient-based optimization method for deep machine learning models ...
research
05/26/2022

VectorAdam for Rotation Equivariant Geometry Optimization

The rise of geometric problems in machine learning has necessitated the ...

Please sign up or login with your details

Forgot password? Click here to reset