Partial Model Averaging in Federated Learning: Performance Guarantees and Benefits

01/11/2022
by   Sunwoo Lee, et al.
0

Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the workers. When local SGD runs with many workers, however, the periodic averaging causes a significant model discrepancy across the workers making the global loss converge slowly. While recent advanced optimization methods tackle the issue focused on non-IID settings, there still exists the model discrepancy issue due to the underlying periodic model averaging. We propose a partial model averaging framework that mitigates the model discrepancy issue in Federated Learning. The partial averaging encourages the local models to stay close to each other on parameter space, and it enables to more effectively minimize the global loss. Given a fixed number of iterations and a large number of workers (128), the partial averaging achieves up to 2.2 higher validation accuracy than the periodic full averaging.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

02/10/2020

Federated Learning of a Mixture of Global and Local Models

We propose a new optimization formulation for training federated learnin...
10/24/2020

Local Averaging Helps: Hierarchical Federated Learning and Convergence Analysis

Federated learning is an effective approach to realize collaborative lea...
10/31/2019

On the Convergence of Local Descent Methods in Federated Learning

In federated distributed learning, the goal is to optimize a global trai...
08/11/2020

Holdout SGD: Byzantine Tolerant Federated Learning

This work presents a new distributed Byzantine tolerant federated learni...
07/13/2020

Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning

Stochastic Gradient Descent (SGD) is the key learning algorithm for many...
06/09/2021

Communication-efficient SGD: From Local SGD to One-Shot Averaging

We consider speeding up stochastic gradient descent (SGD) by parallelizi...
06/23/2016

Parallel SGD: When does averaging help?

Consider a number of workers running SGD independently on the same pool ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.