SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation

06/18/2020
by   Robert M. Gower, et al.
17

We provide several convergence theorems for SGD for two large classes of structured non-convex functions: (i) the Quasar (Strongly) Convex functions and (ii) the functions satisfying the Polyak-Lojasiewicz condition. Our analysis relies on the Expected Residual condition which we show is a strictly weaker assumption as compared to previously used growth conditions, expected smoothness or bounded variance assumptions. We provide theoretical guarantees for the convergence of SGD for different step size selections including constant, decreasing and the recently proposed stochastic Polyak step size. In addition, all of our analysis holds for the arbitrary sampling paradigm, and as such, we are able to give insights into the complexity of minibatching and determine an optimal minibatch size. In particular we recover the best known convergence rates of full gradient descent and single element sampling SGD as a special case. Finally, we show that for models that interpolate the training data, we can dispense of our Expected Residual condition and give state-of-the-art results in this setting.

READ FULL TEXT

page 5

page 7

research
03/15/2023

A Bregman-Kaczmarz method for nonlinear systems of equations

We propose a new randomized method for solving systems of nonlinear equa...
research
10/16/2018

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

Modern machine learning focuses on highly expressive models that are abl...
research
04/29/2019

Making the Last Iterate of SGD Information Theoretically Optimal

Stochastic gradient descent (SGD) is one of the most widely used algorit...
research
02/05/2021

Last iterate convergence of SGD for Least-Squares in the Interpolation regime

Motivated by the recent successes of neural networks that have the abili...
research
05/04/2022

Making SGD Parameter-Free

We develop an algorithm for parameter-free stochastic convex optimizatio...
research
09/18/2020

Global Linear Convergence of Evolution Strategies on More Than Smooth Strongly Convex Functions

Evolution strategies (ESs) are zero-order stochastic black-box optimizat...

Please sign up or login with your details

Forgot password? Click here to reset