Shakeout: A New Approach to Regularized Deep Neural Network Training

04/13/2019
by   Guoliang Kang, et al.
20

Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. Dropout has played an essential role in many successful deep neural networks, by inducing regularization in the model training. In this paper, we present a new regularized training approach: Shakeout. Instead of randomly discarding units as Dropout does at the training stage, Shakeout randomly chooses to enhance or reverse each unit's contribution to the next layer. This minor modification of Dropout has the statistical trait: the regularizer induced by Shakeout adaptively combines L_0, L_1 and L_2 regularization terms. Our classification experiments with representative deep architectures on image datasets MNIST, CIFAR-10 and ImageNet show that Shakeout deals with over-fitting effectively and outperforms Dropout. We empirically demonstrate that Shakeout leads to sparser weights under both unsupervised and supervised settings. Shakeout also leads to the grouping effect of the input units in a layer. Considering the weights in reflecting the importance of connections, Shakeout is superior to Dropout, which is valuable for the deep model compression. Moreover, we demonstrate that Shakeout can effectively reduce the instability of the training process of the deep architecture.

READ FULL TEXT

page 8

page 12

research
08/29/2018

Dropout with Tabu Strategy for Regularizing Deep Neural Networks

Dropout has proven to be an effective technique for regularization and p...
research
11/22/2015

Gradual DropIn of Layers to Train Very Deep Neural Networks

We introduce the concept of dynamically growing a neural network during ...
research
09/21/2019

ASNI: Adaptive Structured Noise Injection for shallow and deep neural networks

Dropout is a regularisation technique in neural network training where u...
research
06/14/2021

The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization

Among the most successful methods for sparsifying deep (neural) networks...
research
05/20/2016

Swapout: Learning an ensemble of deep architectures

We describe Swapout, a new stochastic training method, that outperforms ...
research
05/02/2022

Triangular Dropout: Variable Network Width without Retraining

One of the most fundamental design choices in neural networks is layer w...
research
08/16/2017

BitNet: Bit-Regularized Deep Neural Networks

We present a novel regularization scheme for training deep neural networ...

Please sign up or login with your details

Forgot password? Click here to reset