The Two Regimes of Deep Network Training

02/24/2020
by   Guillaume Leclerc, et al.
0

Learning rate schedule has a major impact on the performance of deep learning models. Still, the choice of a schedule is often heuristical. We aim to develop a precise understanding of the effects of different learning rate schedules and the appropriate way to select them. To this end, we isolate two distinct phases of training, the first, which we refer to as the "large-step" regime, exhibits a rather poor performance from an optimization point of view but is the primary contributor to model generalization; the latter, "small-step" regime exhibits much more "convex-like" optimization behavior but used in isolation produces models that generalize poorly. We find that by treating these regimes separately-and em specializing our training algorithm to each one of them, we can significantly simplify learning rate schedules.

READ FULL TEXT
research
07/09/2021

REX: Revisiting Budgeted Training with an Improved Schedule

Deep learning practitioners often operate on a computational and monetar...
research
08/25/2022

Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima

Learning rate is one of the most important hyper-parameters that has a s...
research
02/26/2020

Disentangling Adaptive Gradient Methods from Learning Rates

We investigate several confounding factors in the evaluation of optimiza...
research
04/29/2019

The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure

There is a stark disparity between the step size schedules used in pract...
research
03/01/2021

Acceleration via Fractal Learning Rate Schedules

When balancing the practical tradeoffs of iterative methods for large-sc...
research
11/30/2021

AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop

Modern deep learning (DL) architectures are trained using variants of th...
research
09/08/2022

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

A fundamental property of deep learning normalization techniques, such a...

Please sign up or login with your details

Forgot password? Click here to reset