Batchless Normalization: How to Normalize Activations with just one Instance in Memory

12/30/2022
by   Benjamin Berger, et al.
0

In training neural networks, batch normalization has many benefits, not all of them entirely understood. But it also has some drawbacks. Foremost is arguably memory consumption, as computing the batch statistics requires all instances within the batch to be processed simultaneously, whereas without batch normalization it would be possible to process them one by one while accumulating the weight gradients. Another drawback is that that distribution parameters (mean and standard deviation) are unlike all other model parameters in that they are not trained using gradient descent but require special treatment, complicating implementation. In this paper, I show a simple and straightforward way to address these issues. The idea, in short, is to add terms to the loss that, for each activation, cause the minimization of the negative log likelihood of a Gaussian distribution that is used to normalize the activation. Among other benefits, this will hopefully contribute to the democratization of AI research by means of lowering the hardware requirements for training larger models.

READ FULL TEXT
research
08/18/2020

Training Deep Neural Networks Without Batch Normalization

Training neural networks is an optimization problem, and finding a decen...
research
05/21/2015

Why Regularized Auto-Encoders learn Sparse Representation?

While the authors of Batch Normalization (BN) identify and address an im...
research
05/15/2019

Online Normalization for Training Neural Networks

Online Normalization is a new technique for normalizing the hidden activ...
research
03/04/2016

Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks

While the authors of Batch Normalization (BN) identify and address an im...
research
05/25/2018

Scalable Methods for 8-bit Training of Neural Networks

Quantized Neural Networks (QNNs) are often used to improve network effic...
research
05/18/2018

Batch Normalization in the final layer of generative networks

Generative Networks have shown great promise in generating photo-realist...
research
12/08/2018

Generalized Batch Normalization: Towards Accelerating Deep Neural Networks

Utilizing recently introduced concepts from statistics and quantitative ...

Please sign up or login with your details

Forgot password? Click here to reset