Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

10/06/2022
by   Ryo Karakida, et al.
0

Gradient regularization (GR) is a method that penalizes the gradient norm of the training loss during training. Although some studies have reported that GR improves generalization performance in deep learning, little attention has been paid to it from the algorithmic perspective, that is, the algorithms of GR that efficiently improve performance. In this study, we first reveal that a specific finite-difference computation, composed of both gradient ascent and descent steps, reduces the computational cost for GR. In addition, this computation empirically achieves better generalization performance. Next, we theoretically analyze a solvable model, a diagonal linear network, and clarify that GR has a desirable implicit bias in a certain problem. In particular, learning with the finite-difference GR chooses better minima as the ascent step size becomes larger. Finally, we demonstrate that finite-difference GR is closely related to some other algorithms based on iterative ascent and descent steps for exploring flat minima: sharpness-aware minimization and the flooding method. We reveal that flooding performs finite-difference GR in an implicit way. Thus, this work broadens our understanding of GR in both practice and theory.

READ FULL TEXT

page 5

page 19

research
06/13/2022

Towards Understanding Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) is a recent training method that reli...
research
02/18/2023

Parameter Averaging for SGD Stabilizes the Implicit Bias towards Flat Regions

Stochastic gradient descent is a workhorse for training deep neural netw...
research
06/10/2019

Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization

Most modern learning problems are highly overparameterized, meaning that...
research
02/23/2023

Sharpness-Aware Minimization: An Implicit Regularization Perspective

Sharpness-Aware Minimization (SAM) is a recent optimization framework ai...
research
05/09/2023

Robust Implicit Regularization via Weight Normalization

Overparameterized models may have many interpolating solutions; implicit...
research
02/01/2020

The Statistical Complexity of Early Stopped Mirror Descent

Recently there has been a surge of interest in understanding implicit re...
research
07/31/2023

Lookbehind Optimizer: k steps back, 1 step forward

The Lookahead optimizer improves the training stability of deep neural n...

Please sign up or login with your details

Forgot password? Click here to reset