On Dropout, Overfitting, and Interaction Effects in Deep Neural Networks

07/02/2020
by   Benjamin Lengerich, et al.
0

We examine Dropout through the perspective of interactions: learned effects that combine multiple input variables. Given N variables, there are O(N^2) possible pairwise interactions, O(N^3) possible 3-way interactions, etc. We show that Dropout implicitly sets a learning rate for interaction effects that decays exponentially with the size of the interaction, corresponding to a regularizer that balances against the hypothesis space which grows exponentially with number of variables in the interaction. This understanding of Dropout has implications for the optimal Dropout rate: higher Dropout rates should be used when we need stronger regularization against spurious high-order interactions. This perspective also issues caution against using Dropout to measure term saliency because Dropout regularizes against terms for high-order interactions. Finally, this view of Dropout as a regularizer of interaction effects provides insight into the varying effectiveness of Dropout for different architectures and data sets. We also compare Dropout to regularization via weight decay and early stopping and find that it is difficult to obtain the same regularization effect for high-order interactions with these methods.

READ FULL TEXT
research
12/22/2014

A Bayesian encourages dropout

Dropout is one of the key techniques to prevent the learning from overfi...
research
02/28/2020

The Implicit and Explicit Regularization Effects of Dropout

Dropout is a widely-used regularization technique, often required to obt...
research
01/23/2022

Weight Expansion: A New Perspective on Dropout and Generalization

While dropout is known to be a successful regularization technique, insi...
research
02/23/2021

Learning High-Order Interactions via Targeted Pattern Search

Logistic Regression (LR) is a widely used statistical method in empirica...
research
12/15/2014

On the Inductive Bias of Dropout

Dropout is a simple but effective technique for learning in neural netwo...
research
02/20/2018

Do deep nets really need weight decay and dropout?

The impressive success of modern deep neural networks on computer vision...
research
11/04/2013

On Fast Dropout and its Applicability to Recurrent Networks

Recurrent Neural Networks (RNNs) are rich models for the processing of s...

Please sign up or login with your details

Forgot password? Click here to reset