DeepAI AI Chat
Log In Sign Up

On the Ideal Number of Groups for Isometric Gradient Propagation

02/07/2023
by   Bum Jun Kim, et al.
0

Recently, various normalization layers have been proposed to stabilize the training of deep neural networks. Among them, group normalization is a generalization of layer normalization and instance normalization by allowing a degree of freedom in the number of groups it uses. However, to determine the optimal number of groups, trial-and-error-based hyperparameter tuning is required, and such experiments are time-consuming. In this study, we discuss a reasonable method for setting the number of groups. First, we find that the number of groups influences the gradient behavior of the group normalization layer. Based on this observation, we derive the ideal number of groups, which calibrates the gradient scale to facilitate gradient descent optimization. Our proposed number of groups is theoretically grounded, architecture-aware, and can provide a proper value in a layer-wise manner for all layers. The proposed method exhibited improved performance over existing methods in numerous neural network architectures, tasks, and datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/21/2019

U-Net Training with Instance-Layer Normalization

Normalization layers are essential in a Deep Convolutional Neural Networ...
06/17/2021

Backward Gradient Normalization in Deep Neural Networks

We introduce a new technique for gradient normalization during neural ne...
07/22/2019

Switchable Normalization for Learning-to-Normalize Deep Representation

We address a learning-to-normalize problem by proposing Switchable Norma...
06/28/2018

Differentiable Learning-to-Normalize via Switchable Normalization

We address a learning-to-normalize problem by proposing Switchable Norma...
04/26/2016

Scale Normalization

One of the difficulties of training deep neural networks is caused by im...
03/25/2021

SubSpectral Normalization for Neural Audio Data Processing

Convolutional Neural Networks are widely used in various machine learnin...