Channel Normalization in Convolutional Neural Network avoids Vanishing Gradients

07/22/2019
by   Zhenwei Dai, et al.
0

Normalization layers are widely used in deep neural networks to stabilize training. In this paper, we consider the training of convolutional neural networks with gradient descent on a single training example. This optimization problem arises in recent approaches for solving inverse problems such as the deep image prior or the deep decoder. We show that for this setup, channel normalization, which centers and normalizes each channel individually, avoids vanishing gradients, whereas, without normalization, gradients vanish which prevents efficient optimization. This effect prevails in deep single-channel linear convolutional networks, and we show that without channel normalization, gradient descent takes at least exponentially many steps to come close to an optimum. Contrary, with channel normalization, the gradients remain bounded, thus avoiding exploding gradients.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2021

Backward Gradient Normalization in Deep Neural Networks

We introduce a new technique for gradient normalization during neural ne...
research
06/20/2020

Blind Descent: A Prequel to Gradient Descent

We describe an alternative to gradient descent for backpropogation throu...
research
08/25/2020

Channel-Directed Gradients for Optimization of Convolutional Neural Networks

We introduce optimization methods for convolutional neural networks that...
research
07/21/2023

Batch Clipping and Adaptive Layerwise Clipping for Differential Private Stochastic Gradient Descent

Each round in Differential Private Stochastic Gradient Descent (DPSGD) t...
research
09/10/2019

Towards Understanding the Importance of Shortcut Connections in Residual Networks

Residual Network (ResNet) is undoubtedly a milestone in deep learning. R...
research
12/10/2017

Gradient Normalization & Depth Based Decay For Deep Learning

In this paper we introduce a novel method of gradient normalization and ...
research
02/07/2023

On the Ideal Number of Groups for Isometric Gradient Propagation

Recently, various normalization layers have been proposed to stabilize t...

Please sign up or login with your details

Forgot password? Click here to reset