A global convergence theory for deep ReLU implicit networks via over-parameterization

10/11/2021
by   Tianxiang Gao, et al.
0

Implicit deep learning has received increasing attention recently due to the fact that it generalizes the recursive prediction rules of many commonly used neural network architectures. Its prediction rule is provided implicitly based on the solution of an equilibrium equation. Although a line of recent empirical studies has demonstrated its superior performances, the theoretical understanding of implicit neural networks is limited. In general, the equilibrium equation may not be well-posed during the training. As a result, there is no guarantee that a vanilla (stochastic) gradient descent (SGD) training nonlinear implicit neural networks can converge. This paper fills the gap by analyzing the gradient flow of Rectified Linear Unit (ReLU) activated implicit neural networks. For an m-width implicit neural network with ReLU activation and n training samples, we show that a randomly initialized gradient descent converges to a global minimum at a linear rate for the square loss function if the implicit neural network is over-parameterized. It is worth noting that, unlike existing works on the convergence of (S)GD on finite-layer over-parameterized neural networks, our convergence results hold for implicit neural networks, where the number of layers is infinite.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2022

Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with Linear Widths

Implicit deep learning has recently become popular in the machine learni...
research
03/05/2019

Implicit Regularization in Over-parameterized Neural Networks

Over-parameterized neural networks generalize well in practice without a...
research
07/02/2020

Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach

Structural equation models (SEMs) are widely used in sciences, ranging f...
research
03/17/2017

Implicit Gradient Neural Networks with a Positive-Definite Mass Matrix for Online Linear Equations Solving

Motivated by the advantages achieved by implicit analogue net for solvin...
research
12/21/2021

More is Less: Inducing Sparsity via Overparameterization

In deep learning it is common to overparameterize the neural networks, t...
research
12/09/2020

Implicit Regularization in ReLU Networks with the Square Loss

Understanding the implicit regularization (or implicit bias) of gradient...
research
05/27/2022

Global Convergence of Over-parameterized Deep Equilibrium Models

A deep equilibrium model (DEQ) is implicitly defined through an equilibr...

Please sign up or login with your details

Forgot password? Click here to reset