Research of Damped Newton Stochastic Gradient Descent Method for Neural Network Training

03/31/2021
by   Jingcheng Zhou, et al.
0

First-order methods like stochastic gradient descent(SGD) are recently the popular optimization method to train deep neural networks (DNNs), but second-order methods are scarcely used because of the overpriced computing cost in getting the high-order information. In this paper, we propose the Damped Newton Stochastic Gradient Descent(DN-SGD) method and Stochastic Gradient Descent Damped Newton(SGD-DN) method to train DNNs for regression problems with Mean Square Error(MSE) and classification problems with Cross-Entropy Loss(CEL), which is inspired by a proved fact that the hessian matrix of last layer of DNNs is always semi-definite. Different from other second-order methods to estimate the hessian matrix of all parameters, our methods just accurately compute a small part of the parameters, which greatly reduces the computational cost and makes convergence of the learning process much faster and more accurate than SGD. Several numerical experiments on real datesets are performed to verify the effectiveness of our methods for regression and classification problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2019

A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems

First-order methods such as stochastic gradient descent (SGD) are curren...
research
02/20/2020

Stochastic Runge-Kutta methods and adaptive SGD-G2 stochastic gradient descent

The minimization of the loss function is of paramount importance in deep...
research
12/14/2015

Preconditioned Stochastic Gradient Descent

Stochastic gradient descent (SGD) still is the workhorse for many practi...
research
06/17/2020

A block coordinate descent optimizer for classification problems exploiting convexity

Second-order optimizers hold intriguing potential for deep learning, but...
research
11/18/2011

Krylov Subspace Descent for Deep Learning

In this paper, we propose a second order optimization method to learn mo...
research
05/21/2018

Small steps and giant leaps: Minimal Newton solvers for Deep Learning

We propose a fast second-order method that can be used as a drop-in repl...
research
01/29/2014

RES: Regularized Stochastic BFGS Algorithm

RES, a regularized stochastic version of the Broyden-Fletcher-Goldfarb-S...

Please sign up or login with your details

Forgot password? Click here to reset