Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization

10/18/2021
by   Tao Sun, et al.
14

Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization algorithms for machine learning. Existing heavy ball momentum is usually weighted by a uniform hyperparameter, which relies on excessive tuning. Moreover, the calibrated fixed hyperparameter may not lead to optimal performance. In this paper, to eliminate the effort for tuning the momentum-related hyperparameter, we propose a new adaptive momentum inspired by the optimal choice of the heavy ball momentum for quadratic optimization. Our proposed adaptive heavy ball momentum can improve stochastic gradient descent (SGD) and Adam. SGD and Adam with the newly designed adaptive momentum are more robust to large learning rates, converge faster, and generalize better than the baselines. We verify the efficiency of SGD and Adam with the new adaptive momentum on extensive machine learning benchmarks, including image classification, language modeling, and machine translation. Finally, we provide convergence guarantees for SGD and Adam with the proposed adaptive momentum.

READ FULL TEXT
research
12/03/2020

Stochastic Gradient Descent with Nonlinear Conjugate Gradient-Style Adaptive Momentum

Momentum plays a crucial role in stochastic gradient-based optimization ...
research
12/20/2017

ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent

Two major momentum-based techniques that have achieved tremendous succes...
research
06/15/2022

On the fast convergence of minibatch heavy ball momentum

Simple stochastic momentum methods are widely used in machine learning o...
research
06/14/2020

On the convergence of the Stochastic Heavy Ball Method

We provide a comprehensive analysis of the Stochastic Heavy Ball (SHB) m...
research
07/25/2019

DEAM: Accumulated Momentum with Discriminative Weight for Stochastic Optimization

Optimization algorithms with momentum, e.g., Nesterov Accelerated Gradie...
research
10/30/2017

Linearly convergent stochastic heavy ball method for minimizing generalization error

In this work we establish the first linear convergence result for the st...
research
07/19/2019

Lookahead Optimizer: k steps forward, 1 step back

The vast majority of successful deep neural networks are trained using v...

Please sign up or login with your details

Forgot password? Click here to reset