On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers

02/15/2021
by   Kenji Kawaguchi, et al.
0

A deep equilibrium model uses implicit layers, which are implicitly defined through an equilibrium point of an infinite sequence of computation. It avoids any explicit computation of the infinite sequence by finding an equilibrium point directly via root-finding and by computing gradients via implicit differentiation. In this paper, we analyze the gradient dynamics of deep equilibrium models with nonlinearity only on weight matrices and non-convex objective functions of weights for regression and classification. Despite non-convexity, convergence to global optimum at a linear rate is guaranteed without any assumption on the width of the models, allowing the width to be smaller than the output dimension and the number of data points. Moreover, we prove a relation between the gradient dynamics of the deep implicit layer and the dynamics of trust region Newton method of a shallow explicit layer. This mathematically proven relation along with our numerical observation suggests the importance of understanding implicit bias of implicit layers and an open problem on the topic. Our proofs deal with implicit layers, weight tying and nonlinearity on weights, and differ from those in the related literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2022

Global Convergence of Over-parameterized Deep Equilibrium Models

A deep equilibrium model (DEQ) is implicitly defined through an equilibr...
research
01/28/2022

Mixing Implicit and Explicit Deep Learning with Skip DEQs and Infinite Time Neural ODEs (Continuous DEQs)

Implicit deep learning architectures, like Neural ODEs and Deep Equilibr...
research
08/07/2021

Approximate Last Iterate Convergence in Overparameterized GANs

In this work, we showed that the Implicit Update and Predictive Methods ...
research
05/27/2021

Optimization Induced Equilibrium Networks

Implicit equilibrium models, i.e., deep neural networks (DNNs) defined b...
research
07/16/2023

Revisiting Implicit Models: Sparsity Trade-offs Capability in Weight-tied Model for Vision Tasks

Implicit models such as Deep Equilibrium Models (DEQs) have garnered sig...
research
10/22/2021

The Equilibrium Hypothesis: Rethinking implicit regularization in Deep Neural Networks

Modern Deep Neural Networks (DNNs) exhibit impressive generalization pro...
research
05/24/2022

Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width

Substantial work indicates that the dynamics of neural networks (NNs) is...

Please sign up or login with your details

Forgot password? Click here to reset