Is Batch Norm unique? An empirical investigation and prescription to emulate the best properties of common normalizers without batch dependence

10/21/2020
by   Vinay Rao, et al.
0

We perform an extensive empirical study of the statistical properties of Batch Norm and other common normalizers. This includes an examination of the correlation between representations of minibatches, gradient norms, and Hessian spectra both at initialization and over the course of training. Through this analysis, we identify several statistical properties which appear linked to Batch Norm's superior performance. We propose two simple normalizers, PreLayerNorm and RegNorm, which better match these desirable properties without involving operations along the batch dimension. We show that PreLayerNorm and RegNorm achieve much of the performance of Batch Norm without requiring batch dependence, that they reliably outperform LayerNorm, and that they can be applied in situations where Batch Norm is ineffective.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2017

Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models

Batch Normalization is quite effective at accelerating and improving the...
research
05/06/2019

Batch Normalization is a Cause of Adversarial Vulnerability

Batch normalization (batch norm) is often used in an attempt to stabiliz...
research
11/29/2022

Disentangling the Mechanisms Behind Implicit Regularization in SGD

A number of competing hypotheses have been proposed to explain why small...
research
01/29/2022

Maximum Batch Frobenius Norm for Multi-Domain Text Classification

Multi-domain text classification (MDTC) has obtained remarkable achievem...
research
04/15/2018

Weighted Low-Rank Approximation of Matrices and Background Modeling

We primarily study a special a weighted low-rank approximation of matric...
research
10/14/2021

On some batch code properties of the simplex code

The binary k-dimensional simplex code is known to be a 2^k-1-batch code ...
research
01/01/2020

A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks

In recent years, plenty of metrics have been proposed to identify networ...

Please sign up or login with your details

Forgot password? Click here to reset