Training (Overparametrized) Neural Networks in Near-Linear Time

06/20/2020
by   Jan van den Brand, et al.
0

The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster 𝑠𝑒𝑐𝑜𝑛𝑑-𝑜𝑟𝑑𝑒𝑟 optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate (𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 of the training batch size n), second-order algorithms incur a daunting slowdown in the 𝑐𝑜𝑠𝑡 𝑝𝑒𝑟 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 (inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [ZMG19, CGH+19], yielding an O(Mn^2)-time second-order algorithm for training overparametrized neural networks with M parameters. We show how to speed up the algorithm of [CGH+19], achieving an Õ(Mn)-time backpropagation algorithm for training (mildly overparametrized) ReLU networks, which is near-linear in the dimension (Mn) of the full gradient (Jacobian) matrix. The centerpiece of our algorithm is to reformulate the Gauss-Newton iteration as an ℓ_2-regression problem, and then use a Fast-JL type dimension reduction to 𝑝𝑟𝑒𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 the underlying Gram matrix in time independent of M, allowing to find a sufficiently good approximate solution via 𝑓𝑖𝑟𝑠𝑡-𝑜𝑟𝑑𝑒𝑟 conjugate gradient. Our result provides a proof-of-concept that advanced machinery from randomized linear algebra-which led to recent breakthroughs in 𝑐𝑜𝑛𝑣𝑒𝑥 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑎𝑡𝑖𝑜𝑛 (ERM, LPs, Regression)-can be carried over to the realm of deep learning as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2019

A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems

First-order methods such as stochastic gradient descent (SGD) are curren...
research
07/07/2021

Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization

Efficiently approximating local curvature information of the loss functi...
research
12/17/2022

Improving Levenberg-Marquardt Algorithm for Neural Networks

We explore the usage of the Levenberg-Marquardt (LM) algorithm for regre...
research
08/09/2022

Training Overparametrized Neural Networks in Sublinear Time

The success of deep learning comes at a tremendous computational and ene...
research
08/02/2017

On the Importance of Consistency in Training Deep Neural Networks

We explain that the difficulties of training deep neural networks come f...
research
12/01/2020

Asymptotic convergence rate of Dropout on shallow linear neural networks

We analyze the convergence rate of gradient flows on objective functions...
research
09/03/2013

SKYNET: an efficient and robust neural network training tool for machine learning in astronomy

We present the first public release of our generic neural network traini...

Please sign up or login with your details

Forgot password? Click here to reset