Normalized Stochastic Gradient Descent Training of Deep Neural Networks

12/20/2022
by   Salih Atici, et al.
0

In this paper, we introduce a novel optimization algorithm for machine learning model training called Normalized Stochastic Gradient Descent (NSGD) inspired by Normalized Least Mean Squares (NLMS) from adaptive filtering. When we train a high-complexity model on a large dataset, the learning rate is significantly important as a poor choice of optimizer parameters can lead to divergence. The algorithm updates the new set of network weights using the stochastic gradient but with ℓ_1 and ℓ_2-based normalizations on the learning rate parameter similar to the NLMS algorithm. Our main difference from the existing normalization methods is that we do not include the error term in the normalization process. We normalize the update term using the input vector to the neuron. Our experiments present that the model can be trained to a better accuracy level on different initial settings using our optimization algorithm. In this paper, we demonstrate the efficiency of our training algorithm using ResNet-20 and a toy neural network on different benchmark datasets with different initializations. The NSGD improves the accuracy of the ResNet-20 from 91.96% to 92.20% on the CIFAR-10 dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2023

QLAB: Quadratic Loss Approximation-Based Optimal Learning Rate for Deep Learning

We propose a learning rate adaptation scheme, called QLAB, for descent o...
research
03/15/2020

Stochastic gradient descent with random learning rate

We propose to optimize neural networks with a uniformly-distributed rand...
research
01/24/2023

Read the Signs: Towards Invariance to Gradient Descent's Hyperparameter Initialization

We propose ActiveLR, an optimization meta algorithm that localizes the l...
research
06/05/2018

Stochastic Gradient Descent with Hyperbolic-Tangent Decay

Learning rate scheduler has been a critical issue in the deep neural net...
research
03/26/2018

A Provably Correct Algorithm for Deep Learning that Actually Works

We describe a layer-by-layer algorithm for training deep convolutional n...
research
08/05/2020

ClipUp: A Simple and Powerful Optimizer for Distribution-based Policy Evolution

Distribution-based search algorithms are an effective approach for evolu...
research
02/29/2020

TAdam: A Robust Stochastic Gradient Optimizer

Machine learning algorithms aim to find patterns from observations, whic...

Please sign up or login with your details

Forgot password? Click here to reset