Three Mechanisms of Weight Decay Regularization

10/29/2018
by   Guodong Zhang, et al.
2

Weight decay is one of the standard tricks in the neural network toolbox, but the reasons for its regularization effect are poorly understood, and recent results have cast doubt on the traditional interpretation in terms of L_2 regularization. Literal weight decay has been shown to outperform L_2 regularization for optimizers for which they differ. We empirically investigate weight decay for three optimization algorithms (SGD, Adam, and K-FAC) and a variety of network architectures. We identify three distinct mechanisms by which weight decay exerts a regularization effect, depending on the particular optimization algorithm and architecture: (1) increasing the effective learning rate, (2) approximately regularizing the input-output Jacobian norm, and (3) reducing the effective damping coefficient for second-order optimization. Our results provide insight into how to improve the regularization of neural networks.

READ FULL TEXT
research
03/25/2020

Volumization as a Natural Generalization of Weight Decay

We propose a novel regularization method, called volumization, for neura...
research
11/23/2020

Stable Weight Decay Regularization

Weight decay is a popular regularization technique for training of deep ...
research
11/14/2017

Fixing Weight Decay Regularization in Adam

We note that common implementations of adaptive gradient algorithms, suc...
research
03/29/2021

FixNorm: Dissecting Weight Decay for Training Deep Neural Networks

Weight decay is a widely used technique for training Deep Neural Network...
research
06/21/2021

How Do Adam and Training Strategies Help BNNs Optimization?

The best performing Binary Neural Networks (BNNs) are usually attained u...
research
12/27/2020

Understanding Decoupled and Early Weight Decay

Weight decay (WD) is a traditional regularization technique in deep lear...
research
04/12/2022

NARX Identification using Derivative-Based Regularized Neural Networks

This work presents a novel regularization method for the identification ...

Please sign up or login with your details

Forgot password? Click here to reset