G-TRACER: Expected Sharpness Optimization

06/24/2023
by   John Williams, et al.
0

We propose a new regularization scheme for the optimization of deep learning architectures, G-TRACER ("Geometric TRACE Ratio"), which promotes generalization by seeking flat minima, and has a sound theoretical basis as an approximation to a natural-gradient descent based optimization of a generalized Bayes objective. By augmenting the loss function with a TRACER, curvature-regularized optimizers (eg SGD-TRACER and Adam-TRACER) are simple to implement as modifications to existing optimizers and don't require extensive tuning. We show that the method converges to a neighborhood (depending on the regularization strength) of a local minimum of the unregularized objective, and demonstrate competitive performance on a number of benchmark computer vision and NLP datasets, with a particular focus on challenging low signal-to-noise ratio problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2020

On regularization of gradient descent, layer imbalance and flat minima

We analyze the training dynamics for deep linear networks using a new me...
research
06/11/2021

Label Noise SGD Provably Prefers Flat Global Minimizers

In overparametrized models, the noise in stochastic gradient descent (SG...
research
06/14/2020

Entropic gradient descent algorithms and wide flat minima

The properties of flat minima in the empirical risk landscape of neural ...
research
07/06/2022

When does SGD favor flat minima? A quantitative characterization via linear stability

The observation that stochastic gradient descent (SGD) favors flat minim...
research
03/31/2023

Per-Example Gradient Regularization Improves Learning Signals from Noisy Data

Gradient regularization, as described in <cit.>, is a highly effective t...
research
01/20/2022

Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape

In this paper, we study the sharpness of a deep learning (DL) loss lands...
research
07/13/2023

Implicit regularization in AI meets generalized hardness of approximation in optimization – Sharp results for diagonal linear networks

Understanding the implicit regularization imposed by neural network arch...

Please sign up or login with your details

Forgot password? Click here to reset