SGD: Decentralized Byzantine Resilience

05/05/2019
by   El Mahdi El Mhamdi, et al.
0

The size of the datasets available today leads to distribute Machine Learning (ML) tasks. An SGD--based optimization is for instance typically carried out by two categories of participants: parameter servers and workers. Some of these nodes can sometimes behave arbitrarily (called Byzantine and caused by corrupt/bogus data/machines), impacting the accuracy of the entire learning activity. Several approaches recently studied how to tolerate Byzantine workers, while assuming honest and trusted parameter servers. In order to achieve total ML robustness, we introduce GuanYu, the first algorithm (to the best of our knowledge) to handle Byzantine parameter servers as well as Byzantine workers. We prove that GuanYu ensures convergence against 1/3 Byzantine parameter servers and 1/3 Byzantine workers, which is optimal in asynchronous networks (GuanYu does also tolerate unbounded communication delays, i.e. asynchrony). To prove the Byzantine resilience of GuanYu, we use a contraction argument, leveraging geometric properties of the median in high dimensional spaces to prevent (with probability 1) any drift on the models within each of the non-Byzantine servers. practicality, we implemented GuanYu using the low-level TensorFlow APIs and deployed it in a distributed setup using the CIFAR-10 dataset. The overhead of tolerating Byzantine participants, compared to a vanilla TensorFlow deployment that is vulnerable to a single Byzantine participant, is around 30% in terms of throughput (model updates per second) - while maintaining the same convergence rate (model updates required to reach some accuracy).

READ FULL TEXT
research
11/18/2019

Fast Machine Learning with Byzantine Workers and Servers

Machine Learning (ML) solutions are nowadays distributed and are prone t...
research
09/22/2022

Making Byzantine Decentralized Learning Efficient

Decentralized-SGD (D-SGD) distributes heavy learning tasks across multip...
research
10/12/2020

Garfield: System Support for Byzantine Machine Learning

Byzantine Machine Learning (ML) systems are nowadays vulnerable for they...
research
02/22/2018

Asynchronous Byzantine Machine Learning

Asynchronous distributed machine learning solutions have proven very eff...
research
05/05/2019

Fast and Secure Distributed Learning in High Dimension

Modern machine learning is distributed and the work of several machines ...
research
05/31/2022

Dropbear: Machine Learning Marketplaces made Trustworthy with Byzantine Model Agreement

Marketplaces for machine learning (ML) models are emerging as a way for ...
research
02/16/2021

Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?

This paper addresses the problem of combining Byzantine resilience with ...

Please sign up or login with your details

Forgot password? Click here to reset