Global Convergence Rate of Deep Equilibrium Models with General Activations

02/11/2023
by   Lan V. Truong, et al.
0

In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation and proved that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. In this paper, we show that this fact still holds for DEQs with any general activation which has bounded first and second derivatives. Since the new activation function is generally non-linear, a general population Gram matrix is designed, and a new form of dual activation with Hermite polynomial expansion is developed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2018

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

One of the mystery in the success of neural networks is randomly initial...
research
10/14/2018

Variational Neural Networks: Every Layer and Neuron Can Be Unique

The choice of activation function can significantly influence the perfor...
research
05/27/2022

Global Convergence of Over-parameterized Deep Equilibrium Models

A deep equilibrium model (DEQ) is implicitly defined through an equilibr...
research
11/13/2019

Quadratic number of nodes is sufficient to learn a dataset via gradient descent

We prove that if an activation function satisfies some mild conditions a...
research
09/10/2020

Activate or Not: Learning Customized Activation

Modern activation layers use non-linear functions to activate the neuron...
research
03/13/2019

Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets

Training activation quantized neural networks involves minimizing a piec...
research
02/06/2020

Global Convergence of Frank Wolfe on One Hidden Layer Networks

We derive global convergence bounds for the Frank Wolfe algorithm when t...

Please sign up or login with your details

Forgot password? Click here to reset