Faster Convergence of Local SGD for Over-Parameterized Models

01/30/2022
by   Tiancheng Qin, et al.
0

Modern machine learning architectures are often highly expressive. They are usually over-parameterized and can interpolate the data by driving the empirical loss close to zero. We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized models in the heterogeneous data setting and improve upon the existing literature by establishing the following convergence rates. We show an error bound of Ø(exp(-T)) for strongly-convex loss functions, where T is the total number of iterations. For general convex loss functions, we establish an error bound of Ø(1/T) under a mild data similarity assumption and an error bound of Ø(K/T) otherwise, where K is the number of local steps. We also extend our results for non-convex loss functions by proving an error bound of Ø(K/T). Before our work, the best-known convergence rate for strongly-convex loss functions was Ø(exp(-T/K)), and none existed for general convex or non-convex loss functions under the overparameterized setting. We complete our results by providing problem instances in which such convergence rates are tight to a constant factor under a reasonably small stepsize scheme. Finally, we validate our theoretical results using numerical experiments on real and synthetic data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2018

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

Modern machine learning focuses on highly expressive models that are abl...
research
02/10/2021

Stability of SGD: Tightness Analysis and Improved Bounds

Stochastic Gradient Descent (SGD) based methods have been widely used fo...
research
08/11/2023

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

The recently proposed stochastic Polyak stepsize (SPS) and stochastic li...
research
10/09/2019

Learning Near-optimal Convex Combinations of Basis Models with Generalization Guarantees

The problem of learning an optimal convex combination of basis models ha...
research
02/13/2018

Fast Global Convergence via Landscape of Empirical Loss

While optimizing convex objective (loss) functions has been a powerhouse...
research
01/19/2023

Convergence beyond the over-parameterized regime using Rayleigh quotients

In this paper, we present a new strategy to prove the convergence of dee...
research
09/10/2019

Better Communication Complexity for Local SGD

We revisit the local Stochastic Gradient Descent (local SGD) method and ...

Please sign up or login with your details

Forgot password? Click here to reset