STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning

by   Prashant Khanduri, et al.

Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on a stochastic momentum estimator, the algorithm requires 𝒊Ėƒ(Ïĩ^-3/2) samples and 𝒊Ėƒ(Ïĩ^-1) communication rounds to compute an Ïĩ-stationary solution. To the best of our knowledge, this is the first FL algorithm that achieves such near-optimal sample and communication complexities simultaneously. Further, we show that there is a trade-off curve between local update frequencies and local minibatch sizes, on which the above sample and communication complexities can be maintained. Finally, we show that for the classical FedAvg (a.k.a. Local SGD, which is a momentum-less special case of the STEM), a similar trade-off curve exists, albeit with worse sample and communication complexities. Our insights on this trade-off provides guidelines for choosing the four important design elements for FL algorithms, the update frequency, directions, and minibatch sizes to achieve the best performance.



There are no comments yet.


page 1

page 2

page 3

page 4

∙ 02/08/2021

Double Momentum SGD for Federated Learning

Communication efficiency is crucial in federated learning. Conducting ma...
∙ 10/08/2019

Accelerating Federated Learning via Momentum Gradient Descent

Federated learning (FL) provides a communication-efficient approach to s...
∙ 05/22/2020

FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data

Federated Learning (FL) has become a popular paradigm for learning from ...
∙ 10/07/2021

Neural Tangent Kernel Empowered Federated Learning

Federated learning (FL) is a privacy-preserving paradigm where multiple ...
∙ 08/23/2021

Anarchic Federated Learning

Present-day federated learning (FL) systems deployed over edge networks ...
∙ 05/23/2021

Fast Federated Learning by Balancing Communication Trade-Offs

Federated Learning (FL) has recently received a lot of attention for lar...
∙ 01/21/2021

Rate Region for Indirect Multiterminal Source Coding in Federated Learning

One of the main focus in federated learning (FL) is the communication ef...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.