Random Convolutional Coding for Robust and Straggler Resilient Distributed Matrix Computation

07/18/2019
by   Anindya B. Das, et al.
0

Distributed matrix computations (matrix-vector and matrix-matrix multiplications) are at the heart of several tasks within the machine learning pipeline. However, distributed clusters are well-recognized to suffer from the problem of stragglers (slow or failed nodes). Prior work in this area has presented straggler mitigation strategies based on polynomial evaluation/interpolation. However, such approaches suffer from numerical problems (blow up of round-off errors) owing to the high condition numbers of the corresponding Vandermonde matrices. In this work, we introduce a novel solution approach that relies on embedding distributed matrix computations into the structure of a convolutional code. This simple innovation allows us to develop a provably numerically robust and efficient (fast) solution for distributed matrix-vector and matrix-matrix multiplication.

READ FULL TEXT
research
01/25/2019

Distributed Matrix-Vector Multiplication: A Convolutional Coding Approach

Distributed computing systems are well-known to suffer from the problem ...
research
02/10/2020

Straggler-resistant distributed matrix computation via coding theory

The current BigData era routinely requires the processing of large scale...
research
10/15/2019

Numerically stable coded matrix computations via circulant and rotation matrix embeddings

Several recent works have used coding-theoretic ideas for mitigating the...
research
08/28/2020

Distributed-memory ℋ-matrix Algebra I: Data Distribution and Matrix-vector Multiplication

We introduce a data distribution scheme for ℋ-matrices and a distributed...
research
01/30/2019

Universally Decodable Matrices for Distributed Matrix-Vector Multiplication

Coded computation is an emerging research area that leverages concepts f...
research
01/23/2018

Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding

We consider the problem of massive matrix multiplication, which underlie...
research
01/30/2020

Numerically Stable Binary Gradient Coding

A major hurdle in machine learning is scalability to massive datasets. O...

Please sign up or login with your details

Forgot password? Click here to reset