The Shattered Gradients Problem: If resnets are the answer, then what is the question?

02/28/2017
by   David Balduzzi, et al.
0

A long-standing obstacle to progress in deep learning is the problem of vanishing and exploding gradients. The problem has largely been overcome through the introduction of carefully constructed initializations and batch normalization. Nevertheless, architectures incorporating skip-connections such as resnets perform much better than standard feedforward architectures despite well-chosen initialization and batch normalization. In this paper, we identify the shattered gradients problem. Specifically, we show that the correlation between gradients in standard feedforward networks decays exponentially with depth resulting in gradients that resemble white noise. In contrast, the gradients in architectures with skip-connections are far more resistant to shattering decaying sublinearly. Detailed empirical evidence is presented in support of the analysis, on both fully-connected networks and convnets. Finally, we present a new "looks linear" (LL) initialization that prevents shattering. Preliminary experiments show the new initialization allows to train very deep networks without the addition of skip-connections.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 5

page 7

02/21/2019

A Mean Field Theory of Batch Normalization

We develop a mean field theory for batch normalization in fully-connecte...
12/15/2017

Gradients explode - Deep Networks are shallow - ResNet explained

Whereas it is believed that techniques such as Adam, batch normalization...
11/07/2018

Characterizing Well-behaved vs. Pathological Deep Neural Network Architectures

We introduce a principled approach, requiring only mild assumptions, for...
08/18/2021

Generalizing MLPs With Dropouts, Batch Normalization, and Skip Connections

A multilayer perceptron (MLP) is typically made of multiple fully connec...
01/31/2017

Skip Connections Eliminate Singularities

Skip connections made the training of very deep networks possible and ha...
10/23/2020

Population Gradients improve performance across data-sets and architectures in object classification

The most successful methods such as ReLU transfer functions, batch norma...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.