Path-SGD: Path-Normalized Optimization in Deep Neural Networks

06/08/2015
by   Behnam Neyshabur, et al.
0

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and AdaGrad.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2016

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

We investigate the parameter-space geometry of recurrent neural networks...
research
01/02/2019

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

Stochastic gradient descent (SGD) has been found to be surprisingly effe...
research
09/13/2017

Normalized Direction-preserving Adam

Optimization algorithms for training deep models not only affects the co...
research
07/25/2021

SGD May Never Escape Saddle Points

Stochastic gradient descent (SGD) has been deployed to solve highly non-...
research
10/18/2019

Interpreting Basis Path Set in Neural Networks

Based on basis path set, G-SGD algorithm significantly outperforms conve...
research
10/23/2022

K-SAM: Sharpness-Aware Minimization at the Speed of SGD

Sharpness-Aware Minimization (SAM) has recently emerged as a robust tech...
research
09/27/2021

Unrolling SGD: Understanding Factors Influencing Machine Unlearning

Machine unlearning is the process through which a deployed machine learn...

Please sign up or login with your details

Forgot password? Click here to reset