Diagonal Rescaling For Neural Networks

05/25/2017
by   Jean Lafond, et al.
0

We define a second-order neural network stochastic gradient training algorithm whose block-diagonal structure effectively amounts to normalizing the unit activations. Investigating why this algorithm lacks in robustness then reveals two interesting insights. The first insight suggests a new way to scale the stepsizes, clarifying popular algorithms such as RMSProp as well as old neural network tricks such as fanin stepsize scaling. The second insight stresses the practical importance of dealing with fast changes of the curvature of the cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2017

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advant...
research
02/19/2018

BDA-PCH: Block-Diagonal Approximation of Positive-Curvature Hessian for Training Neural Networks

We propose a block-diagonal approximation of the positive-curvature Hess...
research
03/19/2015

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

We propose an efficient method for approximating natural gradient descen...
research
09/12/2023

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Shampoo is an online and stochastic optimization algorithm belonging to ...
research
03/31/2023

Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks

As a second-order method, the Natural Gradient Descent (NGD) has the abi...
research
03/09/2023

Scalable Stochastic Gradient Riemannian Langevin Dynamics in Non-Diagonal Metrics

Stochastic-gradient sampling methods are often used to perform Bayesian ...
research
02/22/2023

Regularised neural networks mimic human insight

Humans sometimes show sudden improvements in task performance that have ...

Please sign up or login with your details

Forgot password? Click here to reset