Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks

05/17/2018
by   Guangzeng Xie, et al.
0

In this paper we explore acceleration techniques for large scale nonconvex optimization problems with special focuses on deep neural networks. The extrapolation scheme is a classical approach for accelerating stochastic gradient descent for convex optimization, but it does not work well for nonconvex optimization typically. Alternatively, we propose an interpolation scheme to accelerate nonconvex optimization and call the method Interpolatron. We explain motivation behind Interpolatron and conduct a thorough empirical analysis. Empirical results on DNNs of great depths (e.g., 98-layer ResNet and 200-layer ResNet) on CIFAR-10 and ImageNet show that Interpolatron can converge much faster than the state-of-the-art methods such as the SGD with momentum and Adam. Furthermore, Anderson's acceleration, in which mixing coefficients are computed by least-squares estimation, can also be used to improve the performance. Both Interpolatron and Anderson's acceleration are easy to implement and tune. We also show that Interpolatron has linear convergence rate under certain regularity assumptions.

READ FULL TEXT
research
10/09/2018

Cubic Regularization with Momentum for Nonconvex Optimization

Momentum is a popular technique to accelerate the convergence in practic...
research
05/24/2018

Nonlinear Acceleration of Deep Neural Networks

Regularized nonlinear acceleration (RNA) is a generic extrapolation sche...
research
02/01/2019

Sharp Analysis for Nonconvex SGD Escaping from Saddle Points

In this paper, we prove that the simplest Stochastic Gradient Descent (S...
research
06/04/2018

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD...
research
02/19/2018

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

Conventional wisdom in deep learning states that increasing depth improv...
research
05/22/2017

Batch Size Matters: A Diffusion Approximation Framework on Nonconvex Stochastic Gradient Descent

In this paper, we study the stochastic gradient descent method in analyz...
research
06/01/2018

Nonlinear Acceleration of CNNs

The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleratio...

Please sign up or login with your details

Forgot password? Click here to reset