Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning

08/10/2018
by   Noah Frazier-Logue, et al.
0

Multi-layer neural networks have lead to remarkable performance on many kinds of benchmark tasks in text, speech and image processing. Nonlinear parameter estimation in hierarchical models is known to be subject to overfitting. One approach to this overfitting and related problems (local minima, colinearity, feature discovery etc.) is called dropout (Srivastava, et al 2014, Baldi et al 2016). This method removes hidden units with a Bernoulli random variable with probability p over updates. In this paper we will show that Dropout is a special case of a more general model published originally in 1990 called the stochastic delta rule ( SDR, Hanson, 1990). SDR parameterizes each weight in the network as a random variable with mean μ_w_ij and standard deviation σ_w_ij. These random variables are sampled on each forward activation, consequently creating an exponential number of potential networks with shared weights. Both parameters are updated according to prediction error, thus implementing weight noise injections that reflect a local history of prediction error and efficient model averaging. SDR therefore implements a local gradient-dependent simulated annealing per weight converging to a bayes optimal network. Tests on standard benchmarks (CIFAR) using a modified version of DenseNet shows the SDR outperforms standard dropout in error by over 50 in loss by over 50 solution much faster, reaching a training error of 5 in just 15 epochs with DenseNet-40 compared to standard DenseNet-40's 94 epochs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2013

Understanding Dropout: Training Multi-Layer Perceptrons with Auxiliary Independent Stochastic Neurons

In this paper, a simple, general method of adding auxiliary stochastic n...
research
06/04/2020

Overcoming Overfitting and Large Weight Update Problem in Linear Rectifiers: Thresholded Exponential Rectified Linear Units

In past few years, linear rectified unit activation functions have shown...
research
02/19/2017

Exponentially vanishing sub-optimal local minima in multilayer neural networks

Background: Statistical mechanics results (Dauphin et al. (2014); Chorom...
research
02/06/2016

Improved Dropout for Shallow and Deep Learning

Dropout has been witnessed with great success in training deep neural ne...
research
03/18/2017

Curriculum Dropout

Dropout is a very effective way of regularizing neural networks. Stochas...
research
08/25/2021

Dropout against Deep Leakage from Gradients

As the scale and size of the data increases significantly nowadays, fede...
research
12/25/2018

Dropout Regularization in Hierarchical Mixture of Experts

Dropout is a very effective method in preventing overfitting and has bec...

Please sign up or login with your details

Forgot password? Click here to reset