A Distributed and Elastic Aggregation Service for Scalable Federated Learning Systems

by   Ahmad Khan, et al.

Federated Learning has promised a new approach to resolve the challenges in machine learning by bringing computation to the data. The popularity of the approach has led to rapid progress in the algorithmic aspects and the emergence of systems capable of simulating Federated Learning. State of art systems in Federated Learning support a single node aggregator that is insufficient to train a large corpus of devices or train larger-sized models. As the model size or the number of devices increase the single node aggregator incurs memory and computation burden while performing fusion tasks. It also faces communication bottlenecks when a large number of model updates are sent to a single node. We classify the workload for the aggregator into categories and propose a new aggregation service for handling each load. Our aggregation service is based on a holistic approach that chooses the best solution depending on the model update size and the number of clients. Our system provides a fault-tolerant, robust and efficient aggregation solution utilizing existing parallel and distributed frameworks. Through evaluation, we show the shortcomings of the state of art approaches and how a single solution is not suitable for all aggregation requirements. We also provide a comparison of current frameworks with our system through extensive experiments.


Aggregation Delayed Federated Learning

Federated learning is a distributed machine learning paradigm where mult...

FLRA: A Reference Architecture for Federated Learning Systems

Federated learning is an emerging machine learning paradigm that enables...

SAFER: Sparse secure Aggregation for FEderated leaRning

Federated learning enables one to train a common machine learning model ...

Scalable federated machine learning with FEDn

Federated machine learning has great promise to overcome the input priva...

Aggregation Service for Federated Learning: An Efficient, Secure, and More Resilient Realization

Federated learning has recently emerged as a paradigm promising the bene...

SOAR: Minimizing Network Utilization with Bounded In-network Computing

In-network computing via smart networking devices is a recent trend for ...

Constrained In-network Computing with Low Congestion in Datacenter Networks

Distributed computing has become a common practice nowadays, where the r...

Please sign up or login with your details

Forgot password? Click here to reset