NAMSG: An Efficient Method For Training Neural Networks

05/04/2019
by   Yushu Chen, et al.
Tsinghua University
NetEase, Inc
0

We introduce NAMSG, an adaptive first-order algorithm for training neural networks. The method is efficient in computation and memory, and straightforward to implement. It computes the gradients at configurable remote observation points, in order to expedite the convergence by adjusting the step size for directions with different curvatures, in the stochastic setting. It also scales the updating vector elementwise by a nonincreasing preconditioner, to take the advantages of AMSGRAD. We analyze the convergence properties for both convex and nonconvex problems, by modeling the training process as a dynamic system, and provide a guideline to select the observation distance without grid search. We also propose a datadependent regret bound, which guarantees the convergence in the convex setting. Experiments demonstrate that NAMSG works well in practice and compares favorably to popular adaptive methods, such as ADAM, NADAM, and AMSGRAD.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/16/2017

Normalized Gradient with Adaptive Stepsize Method for Deep Neural Network Training

In this paper, we propose a generic and simple algorithmic framework for...
03/24/2021

A Simple and Efficient Stochastic Rounding Method for Training Neural Networks in Low Precision

Conventional stochastic rounding (CSR) is widely employed in the trainin...
01/01/2021

Adam revisited: a weighted past gradients perspective

Adaptive learning rate methods have been successfully applied in many fi...
05/06/2023

Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

In this paper, we present a comprehensive study on the convergence prope...
07/12/2018

Training Neural Networks Using Features Replay

Training a neural network using backpropagation algorithm requires passi...
07/02/2019

The Role of Memory in Stochastic Optimization

The choice of how to retain information about past gradients dramaticall...
02/10/2023

Gauge-equivariant neural networks as preconditioners in lattice QCD

We demonstrate that a state-of-the art multi-grid preconditioner can be ...

Please sign up or login with your details

Forgot password? Click here to reset