Deep Gradient Boosting

07/29/2019
by   Erhan Bilal, et al.
0

Stochastic gradient descent (SGD) has been the dominant optimization method for training deep neural networks due to its many desirable properties. One of the more remarkable and least understood quality of SGD is that it generalizes relatively well on unseen data even when the neural network has millions of parameters. In this work, we show that SGD is an extreme case of deep gradient boosting (DGB) and as such is intrinsically regularized. The key idea of DGB is that back-propagated gradients calculated using the chain rule can be viewed as pseudo-residual targets. Thus at each layer the weight update is calculated by solving the corresponding gradient boosting problem. We hypothesize that some learning tasks can benefit from a more lax regularization requirement and this approach provides a way to control that. We tested this hypothesis on a number of benchmark data sets and show that indeed in a subset of cases DGB outperforms SGD and under-performs on tasks that are more prone to over-fitting, such as image recognition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2018

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

We propose a population-based Evolutionary Stochastic Gradient Descent (...
research
10/27/2019

A geometric interpretation of stochastic gradient descent using diffusion metrics

Stochastic gradient descent (SGD) is a key ingredient in the training of...
research
11/20/2018

Gradient-Coherent Strong Regularization for Deep Neural Networks

Deep neural networks are often prone to over-fitting with their numerous...
research
12/14/2015

Preconditioned Stochastic Gradient Descent

Stochastic gradient descent (SGD) still is the workhorse for many practi...
research
12/21/2020

Optimizing Deep Neural Networks through Neuroevolution with Stochastic Gradient Descent

Deep neural networks (DNNs) have achieved remarkable success in computer...
research
02/21/2020

The Break-Even Point on Optimization Trajectories of Deep Neural Networks

The early phase of training of deep neural networks is critical for thei...
research
12/24/2020

AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy

Optimizers that further adjust the scale of gradient, such as Adam, Natu...

Please sign up or login with your details

Forgot password? Click here to reset