Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

08/11/2023
by   Xiaowen Jiang, et al.
0

The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for SGD have shown remarkable effectiveness when training over-parameterized models. However, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in a worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al. [2022]), this approach results in slower convergence rates for convex and over-parameterized models. In this work, we make two contributions: Firstly, we propose two new variants of SPS and SLS, called AdaSPS and AdaSLS, which guarantee convergence in non-interpolation settings and maintain sub-linear and linear convergence rates for convex and strongly convex functions when training over-parameterized models. AdaSLS requires no knowledge of problem-dependent parameters, and AdaSPS requires only a lower bound of the optimal function value as input. Secondly, we equip AdaSPS and AdaSLS with a novel variance reduction technique and obtain algorithms that require 𝒪(n+1/ϵ) gradient evaluations to achieve an 𝒪(ϵ)-suboptimality for convex functions, which improves upon the slower 𝒪(1/ϵ^2) rates of AdaSPS and AdaSLS without variance reduction in the non-interpolation regimes. Moreover, our result matches the fast rates of AdaSVRG but removes the inner-outer-loop structure, which is easier to implement and analyze. Finally, numerical experiments on synthetic and real datasets validate our theory and demonstrate the effectiveness and robustness of our algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Recent works have shown that stochastic gradient descent (SGD) achieves ...
research
01/30/2022

Faster Convergence of Local SGD for Over-Parameterized Models

Modern machine learning architectures are often highly expressive. They ...
research
02/18/2021

SVRG Meets AdaGrad: Painless Variance Reduction

Variance reduction (VR) methods for finite-sum minimization typically re...
research
12/26/2020

Variance Reduction on Adaptive Stochastic Mirror Descent

We study the idea of variance reduction applied to adaptive stochastic m...
research
07/01/2014

SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

In this work we introduce a new optimisation method called SAGA in the s...
research
04/19/2021

Random Reshuffling with Variance Reduction: New Analysis and Better Rates

Virtually all state-of-the-art methods for training supervised machine l...
research
05/30/2023

BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization

The popularity of bi-level optimization (BO) in deep learning has spurre...

Please sign up or login with your details

Forgot password? Click here to reset