Gradient-Coherent Strong Regularization for Deep Neural Networks

11/20/2018
by   Dae Hoon Park, et al.
0

Deep neural networks are often prone to over-fitting with their numerous parameters, so regularization plays an important role in generalization. L1 and L2 regularizers are common regularization tools in machine learning with their simplicity and effectiveness. However, we observe that imposing strong L1 or L2 regularization on deep neural networks with stochastic gradient descent easily fails, which limits the generalization ability of the underlying neural networks. To understand this phenomenon, we first investigate how and why learning fails when strong regularization is imposed on deep neural networks. We then propose a novel method, gradient-coherent strong regularization, which imposes regularization only when the gradients are kept coherent in the presence of strong regularization. Experiments are performed with multiple deep architectures on three benchmark data sets for image recognition. Experimental results show that our proposed approach indeed endures strong regularization and significantly improves both accuracy and compression, which could not be achieved otherwise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2018

Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications

Deep neural networks with remarkably strong generalization performances ...
research
07/29/2019

Deep Gradient Boosting

Stochastic gradient descent (SGD) has been the dominant optimization met...
research
04/04/2022

Evolving Neural Selection with Adaptive Regularization

Over-parameterization is one of the inherent characteristics of modern d...
research
09/03/2020

A Partial Regularization Method for Network Compression

Deep Neural Networks have achieved remarkable success relying on the dev...
research
06/06/2020

MMA Regularization: Decorrelating Weights of Neural Networks by Maximizing the Minimal Angles

The strong correlation between neurons or filters can significantly weak...
research
05/23/2018

Learning towards Minimum Hyperspherical Energy

Neural networks are a powerful class of nonlinear functions that can be ...
research
09/26/2019

Convolutional Neural Networks with Dynamic Regularization

Regularization is commonly used in machine learning for alleviating over...

Please sign up or login with your details

Forgot password? Click here to reset