Garfield: System Support for Byzantine Machine Learning

10/12/2020
by   El Mahdi El Mhamdi, et al.
0

Byzantine Machine Learning (ML) systems are nowadays vulnerable for they require trusted machines and/or a synchronous network. We present Garfield, a system that provably achieves Byzantine resilience in ML applications without assuming any trusted component nor any bound on communication or computation delays. Garfield leverages ML specificities to make progress despite consensus being impossible in such an asynchronous, Byzantine environment. Following the classical server/worker architecture, Garfield replicates the parameter server while relying on the statistical properties of stochastic gradient descent to keep the models on the correct servers close to each other. On the other hand, Garfield uses statistically-robust gradient aggregation rules (GARs) to achieve resilience against Byzantine workers. We integrate Garfield with two widely-used ML frameworks, TensorFlow and PyTorch, while achieving transparency: applications developed with either framework do not need to change their interfaces to be made Byzantine resilient. Our implementation supports full-stack computations on both CPUs and GPUs. We report on our evaluation of Garfield with different (a) baselines, (b) ML models (e.g., ResNet-50 and VGG), and (c) hardware infrastructures (CPUs and GPUs). Our evaluation highlights several interesting facts about the cost of Byzantine resilience. In particular, (a) Byzantine resilience, unlike crash resilience, induces an accuracy loss, and (b) the throughput overhead comes much more from communication (70

READ FULL TEXT
research
11/18/2019

Fast Machine Learning with Byzantine Workers and Servers

Machine Learning (ML) solutions are nowadays distributed and are prone t...
research
04/20/2023

Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search

Modern machine learning (ML) models are capable of impressive performanc...
research
05/05/2019

SGD: Decentralized Byzantine Resilience

The size of the datasets available today leads to distribute Machine Lea...
research
05/31/2022

Dropbear: Machine Learning Marketplaces made Trustworthy with Byzantine Model Agreement

Marketplaces for machine learning (ML) models are emerging as a way for ...
research
02/16/2021

Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?

This paper addresses the problem of combining Byzantine resilience with ...
research
03/09/2021

Proof-of-Learning: Definitions and Practice

Training machine learning (ML) models typically involves expensive itera...
research
02/17/2022

An Equivalence Between Data Poisoning and Byzantine Gradient Attacks

To study the resilience of distributed learning, the "Byzantine" literat...

Please sign up or login with your details

Forgot password? Click here to reset