Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks

by   Jy-yong Sohn, et al.

Recent advances in large-scale distributed learning algorithms have enabled communication-efficient training via SIGNSGD. Unfortunately, a major issue continues to plague distributed learning: namely, Byzantine failures may incur serious degradation in learning accuracy. This paper proposes ELECTION CODING, a coding-theoretic framework to guarantee Byzantine-robustness for SIGNSGD WITH MAJORITY VOTE, which uses minimum worker-master communication in both directions. The suggested framework explores new information-theoretic limits of finding the majority opinion when some workers could be malicious, and paves the road to implement robust and efficient distributed learning algorithms. Under this framework, we construct two types of explicit codes, random Bernoulli codes and deterministic algebraic codes, that can tolerate Byzantine attacks with a controlled amount of computational redundancy. For the Bernoulli codes, we provide upper bounds on the error probability in estimating the majority opinion, which give useful insights into code design for tolerating Byzantine attacks. As for deterministic codes, we construct an explicit code which perfectly tolerates Byzantines, and provide tight upper/lower bounds on the minimum required computational redundancy. Finally, the Byzantine-tolerance of the suggested coding schemes is confirmed by deep learning experiments on Amazon EC2 using Python with MPI4py package.


page 1

page 2

page 3

page 4


Randomized Reactive Redundancy for Byzantine Fault-Tolerance in Parallelized Learning

This report considers the problem of Byzantine fault-tolerance in synchr...

Solon: Communication-efficient Byzantine-resilient Distributed Training via Redundant Gradients

There has been a growing need to provide Byzantine-resilience in distrib...

RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

In this paper, we propose a class of robust stochastic subgradient metho...

Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning

Stragglers, Byzantine workers, and data privacy are the main bottlenecks...

Secure Distributed Training at Scale

Some of the hardest problems in deep learning can be solved with the com...

Probabilistic Indistinguishability and the Quality of Validity in Byzantine Agreement

Lower bounds and impossibility results in distributed computing are both...

On Provable Backdoor Defense in Collaborative Learning

As collaborative learning allows joint training of a model using multipl...