Dynamic Batch Adaptation

08/01/2022
by   Cristian Simionescu, et al.
8

Current deep learning adaptive optimizer methods adjust the step magnitude of parameter updates by altering the effective learning rate used by each parameter. Motivated by the known inverse relation between batch size and learning rate on update step magnitudes, we introduce a novel training procedure that dynamically decides the dimension and the composition of the current update step. Our procedure, Dynamic Batch Adaptation (DBA) analyzes the gradients of every sample and selects the subset that best improves certain metrics such as gradient variance for each layer of the network. We present results showing DBA significantly improves the speed of model convergence. Additionally, we find that DBA produces an increased improvement over standard optimizers when used in data scarce conditions where, in addition to convergence speed, it also significantly improves model generalization, managing to train a network with a single fully connected hidden layer using only 1 extreme scenario, it manages to reach 97.44 samples per class. These results represent a relative error rate reduction of 81.78 Gradient Descent (SGD) and Adam.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2016

Coupling Adaptive Batch Sizes with Learning Rates

Mini-batch stochastic gradient descent and variants thereof have become ...
research
02/05/2021

Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate Scaling(LARS) Optimizer

Increasing the batch size of a deep learning model is a challenging task...
research
01/31/2022

Step-size Adaptation Using Exponentiated Gradient Updates

Optimizers like Adam and AdaGrad have been very successful in training l...
research
10/21/2020

Adaptive Gradient Method with Resilience and Momentum

Several variants of stochastic gradient descent (SGD) have been proposed...
research
10/23/2020

Population Gradients improve performance across data-sets and architectures in object classification

The most successful methods such as ReLU transfer functions, batch norma...
research
10/15/2020

Neograd: gradient descent with an adaptive learning rate

Since its inception by Cauchy in 1847, the gradient descent algorithm ha...
research
06/20/2021

Memory Augmented Optimizers for Deep Learning

Popular approaches for minimizing loss in data-driven learning often inv...

Please sign up or login with your details

Forgot password? Click here to reset