A Better Way to Decay: Proximal Gradient Training Algorithms for Neural Nets

10/06/2022
by   Liu Yang, et al.
5

Weight decay is one of the most widely used forms of regularization in deep learning, and has been shown to improve generalization and robustness. The optimization objective driving weight decay is a sum of losses plus a term proportional to the sum of squared weights. This paper argues that stochastic gradient descent (SGD) may be an inefficient algorithm for this objective. For neural networks with ReLU activations, solutions to the weight decay objective are equivalent to those of a different objective in which the regularization term is instead a sum of products of ℓ_2 (not squared) norms of the input and output weights associated each ReLU. This alternative (and effectively equivalent) regularization suggests a novel proximal gradient algorithm for network training. Theory and experiments support the new training approach, showing that it can converge much faster to the sparse solutions it shares with standard weight decay training.

READ FULL TEXT
research
08/25/2021

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

Adaptive gradient methods such as Adam have gained increasing popularity...
research
02/22/2022

Explicit Regularization via Regularizer Mirror Descent

Despite perfectly interpolating the training data, deep neural networks ...
research
06/12/2021

Go Small and Similar: A Simple Output Decay Brings Better Performance

Regularization and data augmentation methods have been widely used and b...
research
05/25/2023

Vector-Valued Variation Spaces and Width Bounds for DNNs: Insights on Weight Decay Regularization

Deep neural networks (DNNs) trained to minimize a loss term plus the sum...
research
12/21/2014

SENNS: Sparse Extraction Neural NetworkS for Feature Extraction

By drawing on ideas from optimisation theory, artificial neural networks...
research
12/27/2020

Understanding Decoupled and Early Weight Decay

Weight decay (WD) is a traditional regularization technique in deep lear...
research
08/07/2020

Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations

Using weight decay to penalize the L2 norms of weights in neural network...

Please sign up or login with your details

Forgot password? Click here to reset