
A Mean Field Theory of Batch Normalization
We develop a mean field theory for batch normalization in fullyconnecte...
read it

Gradients explode  Deep Networks are shallow  ResNet explained
Whereas it is believed that techniques such as Adam, batch normalization...
read it

Characterizing Wellbehaved vs. Pathological Deep Neural Network Architectures
We introduce a principled approach, requiring only mild assumptions, for...
read it

Deep Isometric Learning for Visual Recognition
Initialization, normalization, and skip connections are believed to be t...
read it

Select, Attend, and Transfer: Light, Learnable Skip Connections
Skip connections in deep networks have improved both segmentation and cl...
read it

Skip Connections Eliminate Singularities
Skip connections made the training of very deep networks possible and ha...
read it

Mean Field Residual Networks: On the Edge of Chaos
We study randomly initialized residual networks using mean field theory ...
read it
The Shattered Gradients Problem: If resnets are the answer, then what is the question?
A longstanding obstacle to progress in deep learning is the problem of vanishing and exploding gradients. The problem has largely been overcome through the introduction of carefully constructed initializations and batch normalization. Nevertheless, architectures incorporating skipconnections such as resnets perform much better than standard feedforward architectures despite wellchosen initialization and batch normalization. In this paper, we identify the shattered gradients problem. Specifically, we show that the correlation between gradients in standard feedforward networks decays exponentially with depth resulting in gradients that resemble white noise. In contrast, the gradients in architectures with skipconnections are far more resistant to shattering decaying sublinearly. Detailed empirical evidence is presented in support of the analysis, on both fullyconnected networks and convnets. Finally, we present a new "looks linear" (LL) initialization that prevents shattering. Preliminary experiments show the new initialization allows to train very deep networks without the addition of skipconnections.
READ FULL TEXT
Comments
There are no comments yet.