Expected Gradients of Maxout Networks and Consequences to Parameter Initialization

01/17/2023
by   Hanna Tseran, et al.
0

We study the gradients of a maxout network with respect to inputs and parameters and obtain bounds for the moments depending on the architecture and the parameter distribution. We observe that the distribution of the input-output Jacobian depends on the input, which complicates a stable parameter initialization. Based on the moments of the gradients, we formulate parameter initialization strategies that avoid vanishing and exploding gradients in wide networks. Experiments with deep fully-connected and convolutional networks show that this strategy improves SGD and Adam training of deep maxout networks. In addition, we obtain refined bounds on the expected number of linear regions, results on the expected curve length distortion, and results on the NTK.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2018

Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?

We give a rigorous analysis of the statistical behavior of gradients in ...
research
02/21/2021

Deep ReLU Networks Preserve Expected Length

Assessing the complexity of functions computed by a neural network helps...
research
02/28/2017

The Shattered Gradients Problem: If resnets are the answer, then what is the question?

A long-standing obstacle to progress in deep learning is the problem of ...
research
12/14/2018

Products of Many Large Random Matrices and Gradients in Deep Neural Networks

We study products of random matrices in the regime where the number of t...
research
11/07/2018

Characterizing Well-behaved vs. Pathological Deep Neural Network Architectures

We introduce a principled approach, requiring only mild assumptions, for...
research
03/05/2018

How to Start Training: The Effect of Initialization and Architecture

We investigate the effects of initialization and architecture on the sta...

Please sign up or login with your details

Forgot password? Click here to reset