Characterizing Well-behaved vs. Pathological Deep Neural Network Architectures

11/07/2018
by   Antoine Labatie, et al.
0

We introduce a principled approach, requiring only mild assumptions, for the characterization of deep neural networks at initialization. Our approach applies both to fully-connected and convolutional networks and incorporates the commonly used techniques of batch normalization and skip-connections. Our key insight is to consider the evolution with depth of statistical moments of signal and sensitivity, thereby characterizing the well-behaved or pathological behaviour of input-output mappings encoded by different choices of architecture. We establish: (i) for feedforward networks with and without batch normalization, depth multiplicativity inevitably leads to ill-behaved moments and distributional pathologies; (ii) for residual networks, on the other hand, the mechanism of identity skip-connection induces power-law rather than exponential behaviour, leading to well-behaved moments and no distributional pathology.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2017

The Shattered Gradients Problem: If resnets are the answer, then what is the question?

A long-standing obstacle to progress in deep learning is the problem of ...
research
02/21/2019

A Mean Field Theory of Batch Normalization

We develop a mean field theory for batch normalization in fully-connecte...
research
02/24/2020

Batch Normalization Biases Deep Residual Networks Towards Shallow Paths

Batch normalization has multiple benefits. It improves the conditioning ...
research
02/20/2023

Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization

Stacking many layers to create truly deep neural networks is arguably wh...
research
05/15/2021

Rethinking Skip Connection with Layer Normalization in Transformers and ResNets

Skip connection, is a widely-used technique to improve the performance a...
research
01/17/2023

Expected Gradients of Maxout Networks and Consequences to Parameter Initialization

We study the gradients of a maxout network with respect to inputs and pa...
research
04/07/2021

Spectral Analysis of the Neural Tangent Kernel for Deep Residual Networks

Deep residual network architectures have been shown to achieve superior ...

Please sign up or login with your details

Forgot password? Click here to reset