A Scale Mixture Perspective of Multiplicative Noise in Neural Networks

06/10/2015
by   Eric Nalisnick, et al.
0

Corrupting the input and hidden layers of deep neural networks (DNNs) with multiplicative noise, often drawn from the Bernoulli distribution (or 'dropout'), provides regularization that has significantly contributed to deep learning's success. However, understanding how multiplicative corruptions prevent overfitting has been difficult due to the complexity of a DNN's functional form. In this paper, we show that when a Gaussian prior is placed on a DNN's weights, applying multiplicative noise induces a Gaussian scale mixture, which can be reparameterized to circumvent the problematic likelihood function. Analysis can then proceed by using a type-II maximum likelihood procedure to derive a closed-form expression revealing how regularization evolves as a function of the network's weights. Results show that multiplicative noise forces weights to become either sparse or invariant to rescaling. We find our analysis has implications for model compression as it naturally reveals a weight pruning rule that starkly contrasts with the commonly used signal-to-noise ratio (SNR). While the SNR prunes weights with large variances, seeing them as noisy, our approach recognizes their robustness and retains them. We empirically demonstrate our approach has a strong advantage over the SNR heuristic and is competitive to retraining with soft targets produced from a teacher model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2018

Removing the Feature Correlation Effect of Multiplicative Noise

Multiplicative noise, including dropout, is widely used to regularize de...
research
10/09/2018

Unifying the Dropout Family Through Structured Shrinkage Priors

Dropout regularization of deep neural networks has been a mysterious yet...
research
02/11/2020

Think Global, Act Local: Relating DNN generalisation and node-level SNR

The reasons behind good DNN generalisation remain an open question. In t...
research
05/20/2017

Structured Bayesian Pruning via Log-Normal Multiplicative Noise

Dropout-based regularization methods can be regarded as injecting random...
research
12/20/2022

Walking Noise: Understanding Implications of Noisy Computations on Classification Tasks

Machine learning methods like neural networks are extremely successful a...
research
07/25/2022

On the benefits of non-linear weight updates

Recent work has suggested that the generalisation performance of a DNN i...
research
05/01/2017

Optimum Decoder for Multiplicative Spread Spectrum Image Watermarking with Laplacian Modeling

This paper investigates the multiplicative spread spectrum watermarking ...

Please sign up or login with your details

Forgot password? Click here to reset