Normalized Direction-preserving Adam

09/13/2017
by   Zijun Zhang, et al.
0

Optimization algorithms for training deep models not only affects the convergence rate and stability of the training process, but are also highly related to the generalization performance of the models. While adaptive algorithms, such as Adam and RMSprop, have shown better optimization performance than stochastic gradient descent (SGD) in many scenarios, they often lead to worse generalization performance than SGD, when used for training deep neural networks (DNNs). In this work, we identify two problems of Adam that may degrade the generalization performance. As a solution, we propose the normalized direction-preserving Adam (ND-Adam) algorithm, which combines the best of both worlds, i.e., the good optimization performance of Adam, and the good generalization performance of SGD. In addition, we further improve the generalization performance in classification tasks, by using batch-normalized softmax. This study suggests the need for more precise control over the training process of DNNs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2020

Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent

Stochastic gradient descent (SGD) with constant momentum and its variant...
research
01/21/2020

Understanding Why Neural Networks Generalize Well Through GSNR of Parameters

As deep neural networks (DNNs) achieve tremendous success across many ap...
research
08/18/2019

Towards Better Generalization: BP-SVRG in Training Deep Neural Networks

Stochastic variance-reduced gradient (SVRG) is a classical optimization ...
research
11/16/2018

Minimum norm solutions do not always generalize well for over-parameterized problems

Stochastic gradient descent is the de facto algorithm for training deep ...
research
06/08/2015

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

We revisit the choice of SGD for training deep neural networks by recons...
research
03/17/2018

Constrained Deep Learning using Conditional Gradient and Applications in Computer Vision

A number of results have recently demonstrated the benefits of incorpora...
research
05/08/2019

AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

Deep neural networks have yielded superior performance in many applicati...

Please sign up or login with your details

Forgot password? Click here to reset