Global Convergence of Over-parameterized Deep Equilibrium Models

05/27/2022
by   Zenan Ling, et al.
0

A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2021

On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers

A deep equilibrium model uses implicit layers, which are implicitly defi...
research
02/11/2023

Global Convergence Rate of Deep Equilibrium Models with General Activations

In a recent paper, Ling et al. investigated the over-parametrized Deep E...
research
10/11/2021

A global convergence theory for deep ReLU implicit networks via over-parameterization

Implicit deep learning has received increasing attention recently due to...
research
05/16/2022

Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with Linear Widths

Implicit deep learning has recently become popular in the machine learni...
research
06/15/2020

Monotone operator equilibrium networks

Implicit-depth models such as Deep Equilibrium Networks have recently be...
research
04/03/2020

Reselling Information

Information is replicable in that it can be simultaneously consumed and ...
research
06/05/2022

Demystifying the Global Convergence Puzzle of Learning Over-parameterized ReLU Nets in Very High Dimensions

This theoretical paper is devoted to developing a rigorous theory for de...

Please sign up or login with your details

Forgot password? Click here to reset