Variance Reduced Stochastic Gradient Descent with Neighbors

06/11/2015
by   Thomas Hofmann, et al.
0

Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness, achieving linear convergence. However, these methods are either based on computations of full gradients at pivot points, or on keeping per data point corrections in memory. Therefore speed-ups relative to SGD may need a minimal number of epochs in order to materialize. This paper investigates algorithms that can exploit neighborhood structure in the training data to share and re-use information about past stochastic gradients across data points, which offers advantages in the transient optimization phase. As a side-product we provide a unified convergence analysis for a family of variance reduction algorithms, which we call memorization algorithms. We provide experimental results supporting our theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2015

Variance Reduction for Distributed Stochastic Gradient Descent

Variance reduction (VR) methods boost the performance of stochastic grad...
research
12/21/2018

Stochastic Doubly Robust Gradient

When training a machine learning model with observational data, it is of...
research
06/22/2021

Stochastic Polyak Stepsize with a Moving Target

We propose a new stochastic gradient method that uses recorded past loss...
research
05/04/2023

A Bootstrap Algorithm for Fast Supervised Learning

Training a neural network (NN) typically relies on some type of curve-fo...
research
09/30/2015

Convergence of Stochastic Gradient Descent for PCA

We consider the problem of principal component analysis (PCA) in a strea...
research
05/12/2020

Convergence of Online Adaptive and Recurrent Optimization Algorithms

We prove local convergence of several notable gradient descentalgorithms...
research
11/16/2020

Avoiding Communication in Logistic Regression

Stochastic gradient descent (SGD) is one of the most widely used optimiz...

Please sign up or login with your details

Forgot password? Click here to reset