Scale Normalization

04/26/2016
by   Henry Z. Lo, et al.
0

One of the difficulties of training deep neural networks is caused by improper scaling between layers. Scaling issues introduce exploding / gradient problems, and have typically been addressed by careful scale-preserving initialization. We investigate the value of preserving scale, or isometry, beyond the initial weights. We propose two methods of maintaing isometry, one exact and one stochastic. Preliminary experiments show that for both determinant and scale-normalization effectively speeds up learning. Results suggest that isometry is important in the beginning of learning, and maintaining it leads to faster learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2020

Robust Differentially Private Training of Deep Neural Networks

Differentially private stochastic gradient descent (DPSGD) is a variatio...
research
06/16/2020

New Interpretations of Normalization Methods in Deep Learning

In recent years, a variety of normalization methods have been proposed t...
research
10/04/2019

Farkas layers: don't shift the data, fix the geometry

Successfully training deep neural networks often requires either batch n...
research
03/22/2023

An Empirical Analysis of the Shift and Scale Parameters in BatchNorm

Batch Normalization (BatchNorm) is a technique that improves the trainin...
research
04/09/2022

FoundationLayerNorm: Scaling BERT and GPT to 1,000 Layers

The mainstream BERT/GPT model contains only 10 to 20 layers, and there i...
research
02/07/2023

On the Ideal Number of Groups for Isometric Gradient Propagation

Recently, various normalization layers have been proposed to stabilize t...
research
05/24/2017

Properties of Normalization for a math based intermediate representation

The Normalization transformation plays a key role in the compilation of ...

Please sign up or login with your details

Forgot password? Click here to reset