Gradients explode - Deep Networks are shallow - ResNet explained

12/15/2017
by   George Philipp, et al.
0

Whereas it is believed that techniques such as Adam, batch normalization and, more recently, SeLU nonlinearities "solve" the exploding gradient problem, we show that this is not the case in general and that in a range of popular MLP architectures, exploding gradients exist and that they limit the depth to which networks can be effectively trained, both in theory and in practice. We explain why exploding gradients occur and highlight the *collapsing domain problem*, which can arise in architectures that avoid exploding gradients. ResNets have significantly lower gradients and thus can circumvent the exploding gradient problem, enabling the effective training of much deeper networks, which we show is a consequence of a surprising mathematical property. By noticing that *any neural network is a residual network*, we devise the *residual trick*, which reveals that introducing skip connections simplifies the network mathematically, and that this simplicity may be the major cause for their success.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2017

The Shattered Gradients Problem: If resnets are the answer, then what is the question?

A long-standing obstacle to progress in deep learning is the problem of ...
research
02/24/2020

Batch Normalization Biases Deep Residual Networks Towards Shallow Paths

Batch normalization has multiple benefits. It improves the conditioning ...
research
02/14/2020

Skip Connections Matter: On the Transferability of Adversarial Examples Generated with ResNets

Skip connections are an essential component of current state-of-the-art ...
research
05/20/2016

Residual Networks Behave Like Ensembles of Relatively Shallow Networks

In this work we propose a novel interpretation of residual networks show...
research
11/08/2016

The Loss Surface of Residual Networks: Ensembles and the Role of Batch Normalization

Deep Residual Networks present a premium in performance in comparison to...
research
08/03/2020

Making Coherence Out of Nothing At All: Measuring the Evolution of Gradient Alignment

We propose a new metric (m-coherence) to experimentally study the alignm...
research
10/18/2019

Are Perceptually-Aligned Gradients a General Property of Robust Classifiers?

For a standard convolutional neural network, optimizing over the input p...

Please sign up or login with your details

Forgot password? Click here to reset