Stochastic Whitening Batch Normalization

by   Shengdong Zhang, et al.

Batch Normalization (BN) is a popular technique for training Deep Neural Networks (DNNs). BN uses scaling and shifting to normalize activations of mini-batches to accelerate convergence and improve generalization. The recently proposed Iterative Normalization (IterNorm) method improves these properties by whitening the activations iteratively using Newton's method. However, since Newton's method initializes the whitening matrix independently at each training step, no information is shared between consecutive steps. In this work, instead of exact computation of whitening matrix at each time step, we estimate it gradually during training in an online fashion, using our proposed Stochastic Whitening Batch Normalization (SWBN) algorithm. We show that while SWBN improves the convergence rate and generalization of DNNs, its computational overhead is less than that of IterNorm. Due to the high efficiency of the proposed method, it can be easily employed in most DNN architectures with a large number of layers. We provide comprehensive experiments and comparisons between BN, IterNorm, and SWBN layers to demonstrate the effectiveness of the proposed technique in conventional (many-shot) image classification and few-shot classification tasks.


Context Normalization for Robust Image Classification

Normalization is a pre-processing step that converts the data into a mor...

Decorrelated Batch Normalization

Batch Normalization (BN) is capable of accelerating the training of deep...

LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN Training

When training early-stage deep neural networks (DNNs), generating interm...

Reconstruction of inhomogeneous media by iterative reconstruction algorithm with learned projector

This paper is concerned with the inverse problem of scattering of time-h...

Neuro-Inspired Deep Neural Networks with Sparse, Strong Activations

While end-to-end training of Deep Neural Networks (DNNs) yields state of...

Controllable Orthogonalization in Training DNNs

Orthogonality is widely used for training deep neural networks (DNNs) du...

Recurrent Convolution for Compact and Cost-Adjustable Neural Networks: An Empirical Study

Recurrent convolution (RC) shares the same convolutional kernels and unr...

Please sign up or login with your details

Forgot password? Click here to reset