C^3LES: Codes for Coded Computation that Leverage Stragglers

09/17/2018
by   Anindya B. Das, et al.
0

In distributed computing systems, it is well recognized that worker nodes that are slow (called stragglers) tend to dominate the overall job execution time. Coded computation utilizes concepts from erasure coding to mitigate the effect of stragglers by running "coded" copies of tasks comprising a job. Stragglers are typically treated as erasures in this process. While this is useful, there are issues with applying, e.g., MDS codes in a straightforward manner. Specifically, several applications such as matrix-vector products deal with sparse matrices. MDS codes typically require dense linear combinations of submatrices of the original matrix which destroy their inherent sparsity. This is problematic as it results in significantly higher processing times for computing the submatrix-vector products in coded computation. Furthermore, it also ignores partial computations at stragglers. In this work, we propose a fine-grained model that quantifies the level of non-trivial coding needed to obtain the benefits of coding in matrix-vector computation. Simultaneously, it allows us to leverage partial computations performed by the straggler nodes. For this model, we propose and evaluate several code designs and discuss their properties.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2020

Coded sparse matrix computation schemes that leverage partial stragglers

Distributed matrix computations over large clusters can suffer from the ...
research
09/24/2021

A Unified Treatment of Partial Stragglers and Sparse Matrices in Coded Matrix Computation

The overall execution time of distributed matrix computations is often d...
research
01/30/2019

Universally Decodable Matrices for Distributed Matrix-Vector Multiplication

Coded computation is an emerging research area that leverages concepts f...
research
04/15/2019

Distributed Matrix Multiplication Using Speed Adaptive Coding

While performing distributed computations in today's cloud-based platfor...
research
06/04/2018

Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation

Machine learning algorithms are typically run on large scale, distribute...
research
06/24/2019

Coded Distributed Computing: Performance Limits and Code Designs

We consider the problem of coded distributed computing where a large lin...
research
02/08/2018

Leveraging Coding Techniques for Speeding up Distributed Computing

Large scale clusters leveraging distributed computing frameworks such as...

Please sign up or login with your details

Forgot password? Click here to reset