On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

02/19/2018
by   Sanjeev Arora, et al.
0

Conventional wisdom in deep learning states that increasing depth improves expressiveness but complicates optimization. This paper suggests that, sometimes, increasing depth can speed up optimization. The effect of depth on optimization is decoupled from expressiveness by focusing on settings where additional layers amount to overparameterization - linear neural networks, a well-studied model. Theoretical analysis, as well as experiments, show that here depth acts as a preconditioner which may accelerate convergence. Even on simple convex problems such as linear regression with ℓ_p loss, p>2, gradient descent can benefit from transitioning to a non-convex overparameterized objective, more than it would from some common acceleration schemes. We also prove that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2018

Nonlinear Acceleration of CNNs

The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleratio...
research
10/11/2019

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

We investigate the theoretical limits of pipeline parallel learning of d...
research
09/23/2018

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

In this note, we study the dynamics of gradient descent on objective fun...
research
05/17/2018

Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks

In this paper we explore acceleration techniques for large scale nonconv...
research
12/15/2017

Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice

We introduce a generic scheme for accelerating gradient-based optimizati...
research
12/20/2017

Statistical Inference for the Population Landscape via Moment Adjusted Stochastic Gradients

Modern statistical inference tasks often require iterative optimization ...
research
01/16/2020

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

The selection of initial parameter values for gradient-based optimizatio...

Please sign up or login with your details

Forgot password? Click here to reset