Nonlinear Acceleration of Deep Neural Networks

05/24/2018
by   Damien Scieur, et al.
2

Regularized nonlinear acceleration (RNA) is a generic extrapolation scheme for optimization methods, with marginal computational overhead. It aims to improve convergence using only the iterates of simple iterative algorithms. However, so far its application to optimization was theoretically limited to gradient descent and other single-step algorithms. Here, we adapt RNA to a much broader setting including stochastic gradient with momentum and Nesterov's fast gradient. We use it to train deep neural networks, and empirically observe that extrapolated networks are more accurate, especially in the early iterations. A straightforward application of our algorithm when training ResNet-152 on ImageNet produces a top-1 test error of 20.88 classification pipeline. Furthermore, the code runs offline in this case, so it never negatively affects performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2018

Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks

In this paper we explore acceleration techniques for large scale nonconv...
research
06/01/2018

Nonlinear Acceleration of CNNs

The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleratio...
research
07/19/2018

A unified theory of adaptive stochastic gradient descent as Bayesian filtering

There are a diverse array of schemes for adaptive stochastic gradient de...
research
05/28/2019

Direct Nonlinear Acceleration

Optimization acceleration techniques such as momentum play a key role in...
research
10/05/2020

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Gradient clipping is commonly used in training deep neural networks part...
research
08/13/2022

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Adaptive gradient algorithms borrow the moving average idea of heavy bal...
research
01/17/2020

Gradient descent with momentum — to accelerate or to super-accelerate?

We consider gradient descent with `momentum', a widely used method for l...

Please sign up or login with your details

Forgot password? Click here to reset