A Unified Treatment of Partial Stragglers and Sparse Matrices in Coded Matrix Computation

09/24/2021
by   Anindya Bijoy Das, et al.
0

The overall execution time of distributed matrix computations is often dominated by slow worker nodes (stragglers) over the clusters. Recently, different coding techniques have been utilized to mitigate the effect of stragglers where worker nodes are assigned the task of processing encoded submatrices of the original matrices. In many machine learning or optimization problems the relevant matrices are often sparse. Several coded computation methods operate with dense linear combinations of the original submatrices; this can significantly increase the worker node computation times and consequently the overall job execution time. Moreover, several existing techniques treat the stragglers as failures (erasures) and discard their computations. In this work, we present a coding approach which operates with limited encoding of the original submatrices and utilizes the partial computations done by the slower workers. Our scheme continues to have the optimal threshold of prior work. Extensive numerical experiments done in AWS (Amazon Web Services) cluster confirm that the proposed approach enhances the speed of the worker computations (and thus the whole process) significantly.

READ FULL TEXT
research
12/11/2020

Coded sparse matrix computation schemes that leverage partial stragglers

Distributed matrix computations over large clusters can suffer from the ...
research
09/17/2018

C^3LES: Codes for Coded Computation that Leverage Stragglers

In distributed computing systems, it is well recognized that worker node...
research
01/30/2023

Distributed Matrix Computations with Low-weight Encodings

Straggler nodes are well-known bottlenecks of distributed matrix computa...
research
08/08/2023

Preserving Sparsity and Privacy in Straggler-Resilient Distributed Matrix Computations

Existing approaches to distributed matrix computations involve allocatin...
research
02/08/2018

Leveraging Coding Techniques for Speeding up Distributed Computing

Large scale clusters leveraging distributed computing frameworks such as...
research
11/17/2017

Approximate Gradient Coding via Sparse Random Graphs

Distributed algorithms are often beset by the straggler effect, where th...
research
06/02/2020

Age-Based Coded Computation for Bias Reduction in Distributed Learning

Coded computation can be used to speed up distributed learning in the pr...

Please sign up or login with your details

Forgot password? Click here to reset