On the Regularization Properties of Structured Dropout

10/30/2019
by   Ambar Pal, et al.
0

Dropout and its extensions (eg. DropBlock and DropConnect) are popular heuristics for training neural networks, which have been shown to improve generalization performance in practice. However, a theoretical understanding of their optimization and regularization properties remains elusive. Recent work shows that in the case of single hidden-layer linear networks, Dropout is a stochastic gradient descent method for minimizing a regularized loss, and that the regularizer induces solutions that are low-rank and balanced. In this work we show that for single hidden-layer linear networks, DropBlock induces spectral k-support norm regularization, and promotes solutions that are low-rank and have factors with equal norm. We also show that the global minimizer for DropBlock can be computed in closed form, and that DropConnect is equivalent to Dropout. We then show that some of these results can be extended to a general class of Dropout-strategies, and, with some assumptions, to deep non-linear networks when Dropout is applied to the last layer. We verify our theoretical claims and assumptions experimentally with commonly used network architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2022

Implicit regularization of dropout

It is important to understand how the popular regularization method drop...
research
10/13/2017

Dropout as a Low-Rank Regularizer for Matrix Factorization

Regularization for matrix factorization (MF) and approximation problems ...
research
06/26/2018

On the Implicit Bias of Dropout

Algorithmic approaches endow deep learning systems with implicit bias th...
research
12/15/2014

On the Inductive Bias of Dropout

Dropout is a simple but effective technique for learning in neural netwo...
research
05/11/2023

Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families

Even though dropout is a popular regularization technique, its theoretic...
research
11/04/2016

Information Dropout: Learning Optimal Representations Through Noisy Computation

The cross-entropy loss commonly used in deep learning is closely related...
research
02/18/2021

On Connectivity of Solutions in Deep Learning: The Role of Over-parameterization and Feature Quality

It has been empirically observed that, in deep neural networks, the solu...

Please sign up or login with your details

Forgot password? Click here to reset