Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families

Even though dropout is a popular regularization technique, its theoretical properties are not fully understood. In this paper we study dropout regularization in extended generalized linear models based on double exponential families, for which the dispersion parameter can vary with the features. A theoretical analysis shows that dropout regularization prefers rare but important features in both the mean and dispersion, generalizing an earlier result for conventional generalized linear models. Training is performed using stochastic gradient descent with adaptive learning rate. To illustrate, we apply dropout to adaptive smoothing with B-splines, where both the mean and dispersion parameters are modelled flexibly. The important B-spline basis functions can be thought of as rare features, and we confirm in experiments that dropout is an effective form of regularization for mean and dispersion parameters that improves on a penalized maximum likelihood approach with an explicit smoothness penalty.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2013

Dropout Training as Adaptive Regularization

Dropout and other feature noising schemes control overfitting by artific...
research
09/09/2022

MaxMatch-Dropout: Subword Regularization for WordPiece

We present a subword regularization method for WordPiece, which uses a m...
research
10/30/2019

On the Regularization Properties of Structured Dropout

Dropout and its extensions (eg. DropBlock and DropConnect) are popular h...
research
02/28/2020

The Implicit and Explicit Regularization Effects of Dropout

Dropout is a widely-used regularization technique, often required to obt...
research
06/22/2022

Information Geometry of Dropout Training

Dropout is one of the most popular regularization techniques in neural n...
research
06/18/2023

Dropout Regularization Versus ℓ_2-Penalization in the Linear Model

We investigate the statistical behavior of gradient descent iterates wit...
research
10/28/2016

Adaptive regularization for Lasso models in the context of non-stationary data streams

Large scale, streaming datasets are ubiquitous in modern machine learnin...

Please sign up or login with your details

Forgot password? Click here to reset