Explicit Regularization via Regularizer Mirror Descent

02/22/2022
by   Navid Azizan, et al.
4

Despite perfectly interpolating the training data, deep neural networks (DNNs) can often generalize fairly well, in part due to the "implicit regularization" induced by the learning algorithm. Nonetheless, various forms of regularization, such as "explicit regularization" (via weight decay), are often used to avoid overfitting, especially when the data is corrupted. There are several challenges with explicit regularization, most notably unclear convergence properties. Inspired by convergence properties of stochastic mirror descent (SMD) algorithms, we propose a new method for training DNNs with regularization, called regularizer mirror descent (RMD). In highly overparameterized DNNs, SMD simultaneously interpolates the training data and minimizes a certain potential function of the weights. RMD starts with a standard cost which is the sum of the training loss and a convex regularizer of the weights. Reinterpreting this cost as the potential of an "augmented" overparameterized network and applying SMD yields RMD. As a result, RMD inherits the properties of SMD and provably converges to a point "close" to the minimizer of this cost. RMD is computationally comparable to stochastic gradient descent (SGD) and weight decay, and is parallelizable in the same manner. Our experimental results on training sets with various levels of corruption suggest that the generalization performance of RMD is remarkably robust and significantly better than both SGD and weight decay, which implicitly and explicitly regularize the ℓ_2 norm of the weights. RMD can also be used to regularize the weights to a desired weight vector, which is particularly relevant for continual learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2022

A Better Way to Decay: Proximal Gradient Training Algorithms for Neural Nets

Weight decay is one of the most widely used forms of regularization in d...
research
10/18/2017

Stochastic Weighted Function Norm Regularization

Deep neural networks (DNNs) have become increasingly important due to th...
research
02/18/2023

The Generalization Error of Stochastic Mirror Descent on Over-Parametrized Linear Models

Despite being highly over-parametrized, and having the ability to fully ...
research
01/18/2023

A Novel, Scale-Invariant, Differentiable, Efficient, Scalable Regularizer

L_p-norm regularization schemes such as L_0, L_1, and L_2-norm regulariz...
research
05/16/2018

Regularization Learning Networks

Despite their impressive performance, Deep Neural Networks (DNNs) typica...
research
01/18/2019

Foothill: A Quasiconvex Regularization Function

Deep neural networks (DNNs) have demonstrated success for many supervise...
research
05/25/2023

Vector-Valued Variation Spaces and Width Bounds for DNNs: Insights on Weight Decay Regularization

Deep neural networks (DNNs) trained to minimize a loss term plus the sum...

Please sign up or login with your details

Forgot password? Click here to reset