Coded sparse matrix computation schemes that leverage partial stragglers

12/11/2020
by   Anindya Bijoy Das, et al.
0

Distributed matrix computations over large clusters can suffer from the problem of slow or failed worker nodes (called stragglers) which can dominate the overall job execution time. Coded computation utilizes concepts from erasure coding to mitigate the effect of stragglers by running 'coded' copies of tasks comprising a job; stragglers are typically treated as erasures. While this is useful, there are issues with applying, e.g., MDS codes in a straightforward manner. Several practical matrix computation scenarios involve sparse matrices. MDS codes typically require dense linear combinations of submatrices of the original matrices which destroy their inherent sparsity. This is problematic as it results in significantly higher worker computation times. Moreover, treating slow nodes as erasures ignores the potentially useful partial computations performed by them. Furthermore, some MDS techniques also suffer from significant numerical stability issues. In this work we present schemes that allow us to leverage partial computation by stragglers while imposing constraints on the level of coding that is required in generating the encoded submatrices. This significantly reduces the worker computation time as compared to previous approaches and results in improved numerical stability in the decoding process. Exhaustive numerical experiments on Amazon Web Services (AWS) clusters support our findings.

READ FULL TEXT

page 17

page 18

page 22

page 28

research
09/17/2018

C^3LES: Codes for Coded Computation that Leverage Stragglers

In distributed computing systems, it is well recognized that worker node...
research
09/24/2021

A Unified Treatment of Partial Stragglers and Sparse Matrices in Coded Matrix Computation

The overall execution time of distributed matrix computations is often d...
research
01/30/2019

Universally Decodable Matrices for Distributed Matrix-Vector Multiplication

Coded computation is an emerging research area that leverages concepts f...
research
01/30/2023

Distributed Matrix Computations with Low-weight Encodings

Straggler nodes are well-known bottlenecks of distributed matrix computa...
research
04/26/2023

Coded matrix computation with gradient coding

Polynomial based approaches, such as the Mat-Dot and entangled polynomia...
research
08/08/2023

Preserving Sparsity and Privacy in Straggler-Resilient Distributed Matrix Computations

Existing approaches to distributed matrix computations involve allocatin...
research
02/10/2020

Straggler-resistant distributed matrix computation via coding theory

The current BigData era routinely requires the processing of large scale...

Please sign up or login with your details

Forgot password? Click here to reset