Making Byzantine Decentralized Learning Efficient

09/22/2022
by   Sadegh Farhadkhani, et al.
0

Decentralized-SGD (D-SGD) distributes heavy learning tasks across multiple machines (a.k.a., nodes), effectively dividing the workload per node by the size of the system. However, a handful of Byzantine (i.e., misbehaving) nodes can jeopardize the entire learning procedure. This vulnerability is further amplified when the system is asynchronous. Although approaches that confer Byzantine resilience to D-SGD have been proposed, these significantly impact the efficiency of the process to the point of even negating the benefit of decentralization. This naturally raises the question: can decentralized learning simultaneously enjoy Byzantine resilience and reduced workload per node? We answer positively by proposing that ensures Byzantine resilience without losing the computational efficiency of D-SGD. Essentially, weakens the impact of Byzantine nodes by reducing the variance in local updates using Polyak's momentum. Then, by establishing coordination between nodes via signed echo broadcast and a nearest-neighbor averaging scheme, we effectively tolerate Byzantine nodes whilst distributing the overhead amongst the non-Byzantine nodes. To demonstrate the correctness of our algorithm, we introduce and analyze a novel Lyapunov function that accounts for the non-Markovian model drift arising from the use of momentum. We also demonstrate the efficiency of through experiments on several image classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2019

SGD: Decentralized Byzantine Resilience

The size of the datasets available today leads to distribute Machine Lea...
research
02/22/2018

Asynchronous Byzantine Machine Learning

Asynchronous distributed machine learning solutions have proven very eff...
research
05/24/2022

Byzantine Machine Learning Made Easy by Resilient Averaging of Momentums

Byzantine resilience emerged as a prominent topic within the distributed...
research
06/26/2019

Coded State Machine -- Scaling State Machine Execution under Byzantine Faults

We introduce an information-theoretic framework, named Coded State Machi...
research
08/03/2020

Collaborative Learning as an Agreement Problem

We address the problem of Byzantine collaborative learning: a set of n n...
research
02/16/2021

Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?

This paper addresses the problem of combining Byzantine resilience with ...
research
10/08/2021

Combining Differential Privacy and Byzantine Resilience in Distributed SGD

Privacy and Byzantine resilience (BR) are two crucial requirements of mo...

Please sign up or login with your details

Forgot password? Click here to reset