Achieving acceleration despite very noisy gradients

02/10/2023
by   Kanan Gupta, et al.
0

We present a novel momentum-based first order optimization method (AGNES) which provably achieves acceleration for convex minimization, even if the stochastic noise in the gradient estimates is many orders of magnitude larger than the gradient itself. Here we model the noise as having a variance which is proportional to the magnitude of the underlying gradient. We argue, based upon empirical evidence, that this is appropriate for mini-batch gradients in overparameterized deep learning. Furthermore, we demonstrate that the method achieves competitive performance in the training of CNNs on MNIST and CIFAR-10.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2017

Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization

In this paper, we develop a new accelerated stochastic gradient method f...
research
05/22/2017

Reducing Reparameterization Gradient Variance

Optimization with noisy gradients has become ubiquitous in statistics an...
research
07/09/2020

A Study of Gradient Variance in Deep Learning

The impact of gradient noise on training deep models is widely acknowled...
research
03/18/2016

Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

Nesterov's momentum trick is famously known for accelerating gradient de...
research
02/21/2019

Interplay Between Optimization and Generalization of Stochastic Gradient Descent with Covariance Noise

The choice of batch-size in a stochastic optimization algorithm plays a ...
research
09/02/2022

Revisiting Outer Optimization in Adversarial Training

Despite the fundamental distinction between adversarial and natural trai...
research
12/05/2022

Rethinking the Structure of Stochastic Gradients: Empirical and Statistical Evidence

Stochastic gradients closely relate to both optimization and generalizat...

Please sign up or login with your details

Forgot password? Click here to reset