The Disharmony Between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation Between Activations

04/23/2023
by   Inyoung Paik, et al.
0

Deep neural networks based on batch normalization and ReLU-like activation functions can experience instability during the early stages of training due to the high gradient induced by temporal gradient explosion. We explain how ReLU reduces variance more than expected, and how batch normalization amplifies the gradient during recovery, which causes gradient explosion while forward propagation remains stable. Additionally, we discuss how the dynamics of a deep neural network change during training and how the correlation between inputs can alleviate this problem. Lastly, we propose a better adaptive learning rate algorithm inspired by second-order optimization algorithms, which outperforms existing learning rate scaling methods in large batch training and can also replace WarmUp in small batch training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2020

Large Batch Training Does Not Need Warmup

Training deep neural networks using a large batch size has shown promisi...
research
01/30/2020

How Does BN Increase Collapsed Neural Network Filters?

Improving sparsity of deep neural networks (DNNs) is essential for netwo...
research
11/23/2015

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

We introduce the "exponential linear unit" (ELU) which speeds up learnin...
research
06/22/2023

Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation

We develop an approach to efficiently grow neural networks, within which...
research
10/23/2020

Population Gradients improve performance across data-sets and architectures in object classification

The most successful methods such as ReLU transfer functions, batch norma...
research
06/01/2023

Spreads in Effective Learning Rates: The Perils of Batch Normalization During Early Training

Excursions in gradient magnitude pose a persistent challenge when traini...
research
12/08/2018

Generalized Batch Normalization: Towards Accelerating Deep Neural Networks

Utilizing recently introduced concepts from statistics and quantitative ...

Please sign up or login with your details

Forgot password? Click here to reset