Unifying the Dropout Family Through Structured Shrinkage Priors

10/09/2018
by   Eric Nalisnick, et al.
0

Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network's weights. We derive the equivalence through reparametrization properties of scale mixtures and not via any approximation. Given the equivalence, we then show that dropout's usual Monte Carlo training objective approximates marginal MAP estimation. We analyze this MAP objective under strong shrinkage, showing the expanded parametrization (i.e. likelihood noise) is more stable than a hierarchical representation. Lastly, we derive analogous priors for ResNets, RNNs, and CNNs and reveal their equivalent implementation as noise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2015

A Scale Mixture Perspective of Multiplicative Noise in Neural Networks

Corrupting the input and hidden layers of deep neural networks (DNNs) wi...
research
02/16/2021

Improving Bayesian Inference in Deep Neural Networks with Variational Structured Dropout

Approximate inference in deep Bayesian networks exhibits a dilemma of ho...
research
05/20/2017

Structured Bayesian Pruning via Log-Normal Multiplicative Noise

Dropout-based regularization methods can be regarded as injecting random...
research
09/13/2020

Machine Learning's Dropout Training is Distributionally Robust Optimal

This paper shows that dropout training in Generalized Linear Models is t...
research
09/21/2019

ASNI: Adaptive Structured Noise Injection for shallow and deep neural networks

Dropout is a regularisation technique in neural network training where u...
research
09/19/2018

Removing the Feature Correlation Effect of Multiplicative Noise

Multiplicative noise, including dropout, is widely used to regularize de...
research
11/08/2017

Variational Gaussian Dropout is not Bayesian

Gaussian multiplicative noise is commonly used as a stochastic regularis...

Please sign up or login with your details

Forgot password? Click here to reset