Deep equilibrium networks are sensitive to initialization statistics

07/19/2022
by   Atish Agarwala, et al.
0

Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute. However, theoretical understanding of these models is still lacking compared to traditional networks, in part because of the repeated application of a single set of weights. We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized. In particular, initializing with orthogonal or symmetric matrices allows for greater stability in training. This gives us a practical prescription for initializations which allow for training with a broader range of initial weight scales.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2018

Backward induction for repeated games

We present a method of backward induction for computing approximate subg...
research
11/29/2019

Mean Shift Rejection: Training Deep Neural Networks Without Minibatch Statistics or Normalization

Deep convolutional neural networks are known to be unstable during train...
research
01/28/2021

Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner's dilemma

We investigate the repeated prisoner's dilemma game where both players a...
research
01/07/2019

On the limit behavior of iterated equilibrium distributions for the Gamma and Weibull families

In this paper, we study the evolution of iterated equilibrium distributi...
research
03/07/2022

Higher-order recurrence relations, Sobolev-type inner products and matrix factorizations

It is well known that Sobolev-type orthogonal polynomials with respect t...
research
01/16/2020

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

The selection of initial parameter values for gradient-based optimizatio...

Please sign up or login with your details

Forgot password? Click here to reset