Straggler-resistant distributed matrix computation via coding theory

02/10/2020
by   Aditya Ramamoorthy, et al.
0

The current BigData era routinely requires the processing of large scale data on massive distributed computing clusters. Such large scale clusters often suffer from the problem of "stragglers", which are defined as slow or failed nodes. The overall speed of a computational job on these clusters is typically dominated by stragglers in the absence of a sophisticated assignment of tasks to the worker nodes. In recent years, approaches based on coding theory (referred to as "coded computation") have been effectively used for straggler mitigation. Coded computation offers significant benefits for specific classes of problems such as distributed matrix computations (which play a crucial role in several parts of the machine learning pipeline). The essential idea is to create redundant tasks so that the desired result can be recovered as long as a certain number of worker nodes complete their tasks. In this survey article, we overview recent developments in the field of coding for straggler-resilient distributed matrix computations.

READ FULL TEXT
research
07/18/2019

Random Convolutional Coding for Robust and Straggler Resilient Distributed Matrix Computation

Distributed matrix computations (matrix-vector and matrix-matrix multipl...
research
04/15/2019

Distributed Matrix Multiplication Using Speed Adaptive Coding

While performing distributed computations in today's cloud-based platfor...
research
12/11/2020

Coded sparse matrix computation schemes that leverage partial stragglers

Distributed matrix computations over large clusters can suffer from the ...
research
01/30/2020

Weighted Gradient Coding with Leverage Score Sampling

A major hurdle in machine learning is scalability to massive datasets. A...
research
02/10/2015

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

In this era of large-scale data, distributed systems built on top of clu...
research
08/08/2023

Preserving Sparsity and Privacy in Straggler-Resilient Distributed Matrix Computations

Existing approaches to distributed matrix computations involve allocatin...
research
04/25/2018

Fundamental Limits of Coded Linear Transform

In large scale distributed linear transform problems, coded computation ...

Please sign up or login with your details

Forgot password? Click here to reset