Distributed Matrix Computations with Low-weight Encodings

01/30/2023
by   Anindya Bijoy Das, et al.
0

Straggler nodes are well-known bottlenecks of distributed matrix computations which induce reductions in computation/communication speeds. A common strategy for mitigating such stragglers is to incorporate Reed-Solomon based MDS (maximum distance separable) codes into the framework; this can achieve resilience against an optimal number of stragglers. However, these codes assign dense linear combinations of submatrices to the worker nodes. When the input matrices are sparse, these approaches increase the number of non-zero entries in the encoded matrices, which in turn adversely affects the worker computation time. In this work, we develop a distributed matrix computation approach where the assigned encoded submatrices are random linear combinations of a small number of submatrices. In addition to being well suited for sparse input matrices, our approach continues have the optimal straggler resilience in a certain range of problem parameters. Moreover, compared to recent sparse matrix computation approaches, the search for a “good” set of random coefficients to promote numerical stability in our method is much more computationally efficient. We show that our approach can efficiently utilize partial computations done by slower worker nodes in a heterogeneous system which can enhance the overall computation speed. Numerical experiments conducted through Amazon Web Services (AWS) demonstrate up to 30 computation time and 100x faster encoding compared to the available methods.

READ FULL TEXT
research
08/08/2023

Preserving Sparsity and Privacy in Straggler-Resilient Distributed Matrix Computations

Existing approaches to distributed matrix computations involve allocatin...
research
12/11/2020

Coded sparse matrix computation schemes that leverage partial stragglers

Distributed matrix computations over large clusters can suffer from the ...
research
09/24/2021

A Unified Treatment of Partial Stragglers and Sparse Matrices in Coded Matrix Computation

The overall execution time of distributed matrix computations is often d...
research
05/12/2022

Sparse Random Khatri-Rao Product Codes for Distributed Matrix Multiplication

We introduce two generalizations to the paradigm of using Random Khatri-...
research
04/26/2023

Coded matrix computation with gradient coding

Polynomial based approaches, such as the Mat-Dot and entangled polynomia...
research
01/12/2020

Heterogeneous Computation Assignments in Coded Elastic Computing

We study the optimal design of a heterogeneous coded elastic computing (...
research
08/12/2020

Coded Elastic Computing on Machines with Heterogeneous Storage and Computation Speed

We study the optimal design of heterogeneous Coded Elastic Computing (CE...

Please sign up or login with your details

Forgot password? Click here to reset