Normalization effects on deep neural networks

09/02/2022
by   Jiahui Yu, et al.
0

We study the effect of normalization on the layers of deep neural networks of feed-forward type. A given layer i with N_i hidden units is allowed to be normalized by 1/N_i^γ_i with γ_i∈[1/2,1] and we study the effect of the choice of the γ_i on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on the MNIST data set. We find that in terms of variance of the neural network's output and test accuracy the best choice is to choose the γ_i's to be equal to one, which is the mean-field scaling. We also find that this is particularly true for the outer layer, in that the neural network's behavior is more sensitive in the scaling of the outer layer as opposed to the scaling of the inner layers. The mechanism for the mathematical analysis is an asymptotic expansion for the neural network's output. An important practical consequence of the analysis is that it provides a systematic and mathematically informed way to choose the learning rate hyperparameters. Such a choice guarantees that the neural network behaves in a statistically robust way as the N_i grow to infinity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2020

Normalization effects on shallow neural networks and related asymptotic expansions

We consider shallow (single hidden layer) neural networks and characteri...
research
03/05/2021

Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change

A multiplicative constant scaling factor is often applied to the model o...
research
11/23/2021

Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm

Deep neural networks are notorious for defying theoretical treatment. Ho...
research
05/12/2023

Optimal signal propagation in ResNets through residual scaling

Residual networks (ResNets) have significantly better trainability and t...
research
03/11/2019

Mean Field Analysis of Deep Neural Networks

We analyze multi-layer neural networks in the asymptotic regime of simul...
research
07/31/2020

The Kolmogorov-Arnold representation theorem revisited

There is a longstanding debate whether the Kolmogorov-Arnold representat...
research
07/20/2021

The Smoking Gun: Statistical Theory Improves Neural Network Estimates

In this paper we analyze the L_2 error of neural network regression esti...

Please sign up or login with your details

Forgot password? Click here to reset