Fast Machine Learning with Byzantine Workers and Servers

11/18/2019
by   El Mahdi El Mhamdi, et al.
0

Machine Learning (ML) solutions are nowadays distributed and are prone to various types of component failures, which can be encompassed in so-called Byzantine behavior. This paper introduces LiuBei, a Byzantine-resilient ML algorithm that does not trust any individual component in the network (neither workers nor servers), nor does it induce additional communication rounds (on average), compared to standard non-Byzantine resilient algorithms. LiuBei builds upon gradient aggregation rules (GARs) to tolerate a minority of Byzantine workers. Besides, LiuBei replicates the parameter server on multiple machines instead of trusting it. We introduce a novel filtering mechanism that enables workers to filter out replies from Byzantine server replicas without requiring communication with all servers. Such a filtering mechanism is based on network synchrony, Lipschitz continuity of the loss function, and the GAR used to aggregate workers' gradients. We also introduce a protocol, scatter/gather, to bound drifts between models on correct servers with a small number of communication messages. We theoretically prove that LiuBei achieves Byzantine resilience to both servers and workers and guarantees convergence. We build LiuBei using TensorFlow, and we show that LiuBei tolerates Byzantine behavior with an accuracy loss of around 5 compared to vanilla TensorFlow. We moreover show that the throughput gain of LiuBei compared to another state-of-the-art Byzantine-resilient ML algorithm (that assumes network asynchrony) is 70

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2019

SGD: Decentralized Byzantine Resilience

The size of the datasets available today leads to distribute Machine Lea...
research
10/12/2020

Garfield: System Support for Byzantine Machine Learning

Byzantine Machine Learning (ML) systems are nowadays vulnerable for they...
research
02/22/2018

Asynchronous Byzantine Machine Learning

Asynchronous distributed machine learning solutions have proven very eff...
research
09/11/2023

Practical Homomorphic Aggregation for Byzantine ML

Due to the large-scale availability of data, machine learning (ML) algor...
research
02/12/2023

Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization

Modern ML applications increasingly rely on complex deep learning models...
research
09/19/2023

Preliminaries paper: Byzantine Tolerant Strong Auditable Atomic Register

An auditable register extends the classical register with an audit opera...
research
04/20/2023

Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search

Modern machine learning (ML) models are capable of impressive performanc...

Please sign up or login with your details

Forgot password? Click here to reset