FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training

03/03/2023
by   Zhenheng Tang, et al.
2

Federated Learning (FL) enables collaborations among clients for train machine learning models while protecting their data privacy. Existing FL simulation platforms that are designed from the perspectives of traditional distributed training, suffer from laborious code migration between simulation and production, low efficiency, low GPU utility, low scalability with high hardware requirements and difficulty of simulating stateful clients. In this work, we firstly demystify the challenges and bottlenecks of simulating FL, and design a new FL system named as FedML . It improves the training efficiency, remarkably relaxes the requirements on the hardware, and supports efficient large-scale FL experiments with stateful clients by: (1) sequential training clients on devices; (2) decomposing original aggregation into local and global aggregation on devices and server respectively; (3) scheduling tasks to mitigate straggler problems and enhance computing utility; (4) distributed client state manager to support various FL algorithms. Besides, built upon our generic APIs and communication interfaces, users can seamlessly transform the simulation into the real-world deployment without modifying codes. We evaluate through extensive experiments for training diverse models on various FL datasets to demonstrate that can achieve simulating over 1000 clients (stateful or stateless) with flexible GPU devices setting (4 ∼ 32) and high GPU utility, 1.2 ∼ 4 times faster than FedScale, and 10 ∼ 100 times memory saving than FedML. And we verify that works well with homogeneous and heterogeneous devices in three different clusters. Two FL algorithms with stateful clients and four algorithms with stateless clients are simulated to verify the wide adaptability of to different algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

FedHC: A Scalable Federated Learning Framework for Heterogeneous and Resource-Constrained Clients

Federated Learning (FL) is a distributed learning paradigm that empowers...
research
07/03/2022

Protea: Client Profiling within Federated Systems using Flower

Federated Learning (FL) has emerged as a prospective solution that facil...
research
11/01/2022

TorchFL: A Performant Library for Bootstrapping Federated Learning Experiments

With the increased legislation around data privacy, federated learning (...
research
03/23/2023

FS-Real: Towards Real-World Cross-Device Federated Learning

Federated Learning (FL) aims to train high-quality models in collaborati...
research
11/20/2022

FedDCT: Federated Learning of Large Convolutional Neural Networks on Resource Constrained Devices using Divide and Co-Training

We introduce FedDCT, a novel distributed learning paradigm that enables ...
research
11/04/2019

A Crowdsourcing Framework for On-Device Federated Learning

Federated learning (FL) rests on the notion of training a global model i...
research
05/17/2021

EasyFL: A Low-code Federated Learning Platform For Dummies

Academia and industry have developed several platforms to support the po...

Please sign up or login with your details

Forgot password? Click here to reset