Normalized Gradient with Adaptive Stepsize Method for Deep Neural Network Training

07/16/2017
by   Adams Wei Yu, et al.
0

In this paper, we propose a generic and simple algorithmic framework for first order optimization. The framework essentially contains two consecutive steps in each iteration: 1) computing and normalizing the mini-batch stochastic gradient; 2) selecting adaptive step size to update the decision variable (parameter) towards the negative of the normalized gradient. We show that the proposed approach, when customized to the popular adaptive stepsize methods, such as AdaGrad, can enjoy a sublinear convergence rate, if the objective is convex. We also conduct extensive empirical studies on various non-convex neural network optimization problems, including multi layer perceptron, convolution neural networks and recurrent neural networks. The results indicate the normalized gradient with adaptive step size can help accelerate the training of neural networks. In particular, significant speedup can be observed if the networks are deep or the dependencies are long.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2021

On the Convergence of Step Decay Step-Size for Stochastic Optimization

The convergence of stochastic gradient descent is highly dependent on th...
research
05/04/2019

NAMSG: An Efficient Method For Training Neural Networks

We introduce NAMSG, an adaptive first-order algorithm for training neura...
research
07/02/2020

Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems

Stochastic gradient descent is a canonical tool for addressing stochasti...
research
09/06/2019

Decentralized Stochastic Gradient Tracking for Non-convex Empirical Risk Minimization

This paper studies a decentralized stochastic gradient tracking (DSGT) a...
research
02/28/2017

Learning What Data to Learn

Machine learning is essentially the sciences of playing with data. An ad...
research
09/26/2018

Batch-normalized Recurrent Highway Networks

Gradient control plays an important role in feed-forward networks applie...
research
08/28/2022

Asynchronous Training Schemes in Distributed Learning with Time Delay

In the context of distributed deep learning, the issue of stale weights ...

Please sign up or login with your details

Forgot password? Click here to reset