Trading Communication for Computation in Byzantine-Resilient Gradient Coding

03/23/2023
by   Christoph Hofmeister, et al.
0

We consider gradient coding in the presence of an adversary, controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the inputs of the malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partial gradient is computed by. In this work, we reduce replication by proposing a method that detects the erroneous inputs from the malicious workers, hence transforming them into erasures. For s malicious workers, our solution can reduce the replication to s+1 instead of 2s+1 for each partial gradient at the expense of only s additional computations at the main node and additional rounds of light communication between the main node and the workers. We give fundamental limits of the general framework for fractional repetition data allocation. Our scheme is optimal in terms of replication and local computation but incurs a communication cost that is asymptotically, in the size of the dataset, a multiplicative factor away from the derived bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2022

Nested Gradient Codes for Straggler Mitigation in Distributed Machine Learning

We consider distributed learning in the presence of slow and unresponsiv...
research
02/19/2021

On Gradient Coding with Partial Recovery

We consider a generalization of the recently proposed gradient coding fr...
research
08/12/2021

Secure Private and Adaptive Matrix Multiplication Beyond the Singleton Bound

Consider the problem of designing secure and private codes for distribut...
research
03/02/2021

Optimal Communication-Computation Trade-Off in Heterogeneous Gradient Coding

Gradient coding allows a master node to derive the aggregate of the part...
research
04/10/2020

Straggler-aware Distributed Learning: Communication Computation Latency Trade-off

When gradient descent (GD) is scaled to many parallel workers for large ...
research
04/30/2019

Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers

Distributed implementations of gradient-based methods, wherein a server ...
research
09/20/2021

ApproxIFER: A Model-Agnostic Approach to Resilient and Robust Prediction Serving Systems

Due to the surge of cloud-assisted AI services, the problem of designing...

Please sign up or login with your details

Forgot password? Click here to reset