Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

02/25/2016
by   Tim Salimans, et al.
0

We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speed-up of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization steps to be taken in the same amount of time. We demonstrate the usefulness of our method on applications in supervised image recognition, generative modelling, and deep reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2020

Weight and Gradient Centralization in Deep Neural Networks

Batch normalization is currently the most widely used variant of interna...
research
05/27/2018

Towards a Theoretical Understanding of Batch Normalization

Normalization techniques such as Batch Normalization have been applied v...
research
08/18/2020

Training Deep Neural Networks Without Batch Normalization

Training neural networks is an optimization problem, and finding a decen...
research
01/08/2021

Towards Accelerating Training of Batch Normalization: A Manifold Perspective

Batch normalization (BN) has become a crucial component across diverse d...
research
09/29/2018

On the Convergence and Robustness of Batch Normalization

Despite its empirical success, the theoretical underpinnings of the stab...
research
10/15/2018

Revisit Batch Normalization: New Understanding from an Optimization View and a Refinement via Composition Optimization

Batch Normalization (BN) has been used extensively in deep learning to a...
research
09/28/2022

Breaking Time Invariance: Assorted-Time Normalization for RNNs

Methods such as Layer Normalization (LN) and Batch Normalization (BN) ha...

Please sign up or login with your details

Forgot password? Click here to reset