Per-Example Gradient Regularization Improves Learning Signals from Noisy Data

03/31/2023
by   Xuran Meng, et al.
0

Gradient regularization, as described in <cit.>, is a highly effective technique for promoting flat minima during gradient descent. Empirical evidence suggests that this regularization technique can significantly enhance the robustness of deep learning models against noisy perturbations, while also reducing test error. In this paper, we explore the per-example gradient regularization (PEGR) and present a theoretical analysis that demonstrates its effectiveness in improving both test error and robustness against noise perturbations. Specifically, we adopt a signal-noise data model from <cit.> and show that PEGR can learn signals effectively while suppressing noise. In contrast, standard gradient descent struggles to distinguish the signal from the noise, leading to suboptimal generalization performance. Our analysis reveals that PEGR penalizes the variance of pattern learning, thus effectively suppressing the memorization of noises from the training data. These findings underscore the importance of variance control in deep learning training and offer useful insights for developing more effective training approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2020

Implicit Gradient Regularization

Gradient descent can be surprisingly good at optimizing deep neural netw...
research
02/06/2022

Anticorrelated Noise Injection for Improved Generalization

Injecting artificial noise into gradient descent (GD) is commonly employ...
research
06/09/2022

Explicit Regularization in Overparametrized Models via Noise Injection

Injecting noise within gradient descent has several desirable features. ...
research
05/22/2018

Adversarially Robust Training through Structured Gradient Regularization

We propose a novel data-dependent structured gradient regularizer to inc...
research
07/18/2020

On regularization of gradient descent, layer imbalance and flat minima

We analyze the training dynamics for deep linear networks using a new me...
research
06/24/2023

G-TRACER: Expected Sharpness Optimization

We propose a new regularization scheme for the optimization of deep lear...
research
06/18/2020

When Does Preconditioning Help or Hurt Generalization?

While second order optimizers such as natural gradient descent (NGD) oft...

Please sign up or login with your details

Forgot password? Click here to reset