Continuous vs. Discrete Optimization of Deep Neural Networks

07/14/2021
by   Omer Elkabetz, et al.
0

Existing analyses of optimization in deep learning are either continuous, focusing on (variants of) gradient flow, or discrete, directly treating (variants of) gradient descent. Gradient flow is amenable to theoretical analysis, but is stylized and disregards computational efficiency. The extent to which it represents gradient descent is an open question in deep learning theory. The current paper studies this question. Viewing gradient descent as an approximate numerical solution to the initial value problem of gradient flow, we find that the degree of approximation depends on the curvature along the latter's trajectory. We then show that over deep neural networks with homogeneous activations, gradient flow trajectories enjoy favorable curvature, suggesting they are well approximated by gradient descent. This finding allows us to translate an analysis of gradient flow over deep linear neural networks into a guarantee that gradient descent efficiently converges to global minimum almost surely under random initialization. Experiments suggest that over simple deep neural networks, gradient descent with conventional step size is indeed close to the continuous limit. We hypothesize that the theory of gradient flows will be central to unraveling mysteries behind deep learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2018

Secondary gradient descent in higher codimension

In this paper, we analyze discrete gradient descent and ϵ-noisy gradient...
research
08/14/2018

Discrete gradient descent differs qualitatively from gradient flow

We consider gradient descent on functions of the form L_1 = |f| and L_2 ...
research
05/26/2022

A framework for overparameterized learning

An explanation for the success of deep neural networks is a central ques...
research
12/20/2020

Recent advances in deep learning theory

Deep learning is usually described as an experiment-driven field under c...
research
04/06/2021

Proof of the Theory-to-Practice Gap in Deep Learning via Sampling Complexity bounds for Neural Network Approximation Spaces

We study the computational complexity of (deterministic or randomized) a...
research
02/29/2020

Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning

The success of deep learning is due, to a great extent, to the remarkabl...
research
12/23/2019

BackPACK: Packing more into backprop

Automatic differentiation frameworks are optimized for exactly one thing...

Please sign up or login with your details

Forgot password? Click here to reset