Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning

02/29/2020
by   Chaoyue Liu, et al.
27

The success of deep learning is due, to a great extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. In this work we isolate some general mathematical structures allowing for efficient optimization in over-parameterized systems of non-linear equations, a setting that includes deep neural networks. In particular, we show that optimization problems corresponding to such systems are not convex, even locally, but instead satisfy the Polyak-Lojasiewicz (PL) condition allowing for efficient optimization by gradient descent or SGD. We connect the PL condition of these systems to the condition number associated to the tangent kernel and develop a non-linear theory parallel to classical analyses of over-parameterized linear equations. We discuss how these ideas apply to training shallow and deep neural networks. Finally, we point out that tangent kernels associated to certain large system may be far from constant, even locally. Yet, our analysis still allows to demonstrate existence of solutions and convergence of gradient descent and SGD.

READ FULL TEXT
research
02/04/2019

A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks

Empirical studies show that gradient based methods can learn deep neural...
research
10/12/2022

Rigorous dynamical mean field theory for stochastic gradient descent methods

We prove closed-form equations for the exact high-dimensional asymptotic...
research
07/14/2021

Continuous vs. Discrete Optimization of Deep Neural Networks

Existing analyses of optimization in deep learning are either continuous...
research
04/09/2019

A Non-linear Differential CNN-Rendering Module for 3D Data Enhancement

In this work we introduce a differential rendering module which allows n...
research
03/30/2017

Diving into the shallows: a computational perspective on large-scale shallow learning

In this paper we first identify a basic limitation in gradient descent-b...
research
03/18/2021

A deep learning theory for neural networks grounded in physics

In the last decade, deep learning has become a major component of artifi...

Please sign up or login with your details

Forgot password? Click here to reset