Pushing the bounds of dropout

05/23/2018
by   Gábor Melis, et al.
0

We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that compute a power mean over the sampled dropout masks, and their less stochastic subvariants with tighter and higher lower bounds than the fully stochastic dropout objective. We argue that since the deterministic subvariant's bound is equal to its objective, and the highest amongst these models, the predominant view of it as a good approximation to MC averaging is misleading. Rather, deterministic dropout is the best available approximation to the true objective.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2016

Generalized Dropout

Deep Neural Networks often require good regularizers to generalize well....
research
11/12/2017

Alpha-Divergences in Variational Dropout

We investigate the use of alternative divergences to Kullback-Leibler (K...
research
08/12/2015

Bayesian Dropout

Dropout has recently emerged as a powerful and simple method for trainin...
research
02/18/2013

Maxout Networks

We consider the problem of designing models to leverage a recently intro...
research
05/25/2023

Stochastic Modified Equations and Dynamics of Dropout Algorithm

Dropout is a widely utilized regularization technique in the training of...
research
07/11/2014

Altitude Training: Strong Bounds for Single-Layer Dropout

Dropout training, originally designed for deep neural networks, has been...
research
08/24/2019

Don't ignore Dropout in Fully Convolutional Networks

Data for Image segmentation models can be costly to obtain due to the pr...

Please sign up or login with your details

Forgot password? Click here to reset