Generalized Batch Normalization: Towards Accelerating Deep Neural Networks

12/08/2018
by   Xiaoyong Yuan, et al.
0

Utilizing recently introduced concepts from statistics and quantitative risk management, we present a general variant of Batch Normalization (BN) that offers accelerated convergence of Neural Network training compared to conventional BN. In general, we show that mean and standard deviation are not always the most appropriate choice for the centering and scaling procedure within the BN transformation, particularly if ReLU follows the normalization step. We present a Generalized Batch Normalization (GBN) transformation, which can utilize a variety of alternative deviation measures for scaling and statistics for centering, choices which naturally arise from the theory of generalized deviation measures and risk theory in general. When used in conjunction with the ReLU non-linearity, the underlying risk theory suggests natural, arguably optimal choices for the deviation measure and statistic. Utilizing the suggested deviation measure and statistic, we show experimentally that training is accelerated more so than with conventional BN, often with improved error rate as well. Overall, we propose a more flexible BN transformation supported by a complimentary theoretical framework that can potentially guide design choices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2023

Expectile Quadrangle and Applications

The paper explores the concept of the expectile risk measure within the ...
research
05/21/2015

Why Regularized Auto-Encoders learn Sparse Representation?

While the authors of Batch Normalization (BN) identify and address an im...
research
07/31/2019

An Empirical Study of Batch Normalization and Group Normalization in Conditional Computation

Batch normalization has been widely used to improve optimization in deep...
research
04/23/2023

The Disharmony Between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation Between Activations

Deep neural networks based on batch normalization and ReLU-like activati...
research
06/10/2019

Scaling Laws for the Principled Design, Initialization and Preconditioning of ReLU Networks

In this work, we describe a set of rules for the design and initializati...
research
12/30/2022

Batchless Normalization: How to Normalize Activations with just one Instance in Memory

In training neural networks, batch normalization has many benefits, not ...
research
10/27/2017

Revisit Fuzzy Neural Network: Demystifying Batch Normalization and ReLU with Generalized Hamming Network

We revisit fuzzy neural network with a cornerstone notion of generalized...

Please sign up or login with your details

Forgot password? Click here to reset