AdaS: Adaptive Scheduling of Stochastic Gradients

06/11/2020
by   Mahdi S. Hosseini, et al.
0

The choice of step-size used in Stochastic Gradient Descent (SGD) optimization is empirically selected in most training procedures. Moreover, the use of scheduled learning techniques such as Step-Decaying, Cyclical-Learning, and Warmup to tune the step-size requires extensive practical experience–offering limited insight into how the parameters update–and is not consistent across applications. This work attempts to answer a question of interest to both researchers and practitioners, namely "how much knowledge is gained in iterative training of deep neural networks?" Answering this question introduces two useful metrics derived from the singular values of the low-rank factorization of convolution layers in deep neural networks. We introduce the notions of "knowledge gain" and "mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS) that utilizes these derived metrics to adapt the SGD learning rate proportionally to the rate of change in knowledge gain over successive iterations. Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training. Code is available at <https://github.com/mahdihosseini/AdaS>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2020

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

We propose a stochastic variant of the classical Polyak step-size (Polya...
research
09/12/2019

diffGrad: An Optimization Method for Convolutional Neural Networks

Stochastic Gradient Decent (SGD) is one of the core techniques behind th...
research
09/03/2023

Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments

This paper introduces a novel approach to enhance the performance of the...
research
03/31/2022

Exploiting Explainable Metrics for Augmented SGD

Explaining the generalization characteristics of deep learning is an eme...
research
07/21/2019

signADAM: Learning Confidences for Deep Neural Networks

In this paper, we propose a new first-order gradient-based algorithm to ...
research
11/19/2018

Deep Frank-Wolfe For Neural Network Optimization

Learning a deep neural network requires solving a challenging optimizati...
research
02/09/2020

On the distance between two neural networks and the stability of learning

How far apart are two neural networks? This is a foundational question i...

Please sign up or login with your details

Forgot password? Click here to reset