The Lingering of Gradients: How to Reuse Gradients over Time

01/09/2019
by   Zeyuan Allen-Zhu, et al.
0

Classically, the time complexity of a first-order method is estimated by its number of gradient computations. In this paper, we study a more refined complexity by taking into account the "lingering" of gradients: once a gradient is computed at x_k, the additional time to compute gradients at x_k+1,x_k+2,... may be reduced. We show how this improves the running time of gradient descent and SVRG. For instance, if the "additional time" scales linearly with respect to the traveled distance, then the "convergence rate" of gradient descent can be improved from 1/T to (-T^1/3). On the empirical side, we solve a hypothetical revenue management problem on the Yahoo! Front Page Today Module application with 4.6m users to 10^-6 error (or 10^-12 dual error) using 6 passes of the dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2021

Exact asymptotic characterisation of running time for approximate gradient descent on random graphs

In this work we study the time complexity for the search of local minima...
research
06/11/2019

A refined primal-dual analysis of the implicit bias

Recent work shows that gradient descent on linearly separable data is im...
research
02/06/2023

Optimization using Parallel Gradient Evaluations on Multiple Parameters

We propose a first-order method for convex optimization, where instead o...
research
12/05/2013

Semi-Stochastic Gradient Descent Methods

In this paper we study the problem of minimizing the average of a large ...
research
05/25/2018

LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning

This paper presents a new class of gradient methods for distributed mach...
research
05/25/2023

DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

This paper proposes a new easy-to-implement parameter-free gradient-base...
research
02/25/2020

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization

An open question in the Deep Learning community is why neural networks t...

Please sign up or login with your details

Forgot password? Click here to reset