Locally Random Alloy Codes with Channel Coding Theorems for Distributed Matrix Multiplication

02/07/2022
by   Pedro Soto, et al.
7

Matrix multiplication is a fundamental operation in machine learning and is commonly distributed into multiple parallel tasks for large datasets. Stragglers and other failures can severely impact the overall completion time. Recent works in coded computing provide a novel strategy to mitigate stragglers with coded tasks, with an objective of minimizing the number of tasks needed to recover the overall result, known as the recovery threshold. However, we demonstrate that this combinatorial definition does not directly optimize the probability of failure. In this paper, we introduce a novel analytical metric, which focuses on the most likely event and measures the optimality of a coding scheme by its probability of decoding. Our general framework encompasses many other computational schemes and metrics as a special case. Far from being a purely theoretical construction, these definitions lead us to a practical construction of random codes for matrix multiplication, i.e., locally random alloy codes, which are optimal with respect to the measures. We present experimental results on Amazon EC2 which empirically demonstrate the improvement in terms of running time and numerical stability relative to well-established benchmarks.

READ FULL TEXT

page 1

page 5

research
07/25/2019

Factored LT and Factored Raptor Codes for Large-Scale Distributed Matrix Multiplication

We propose two coding schemes for distributed matrix multiplication in t...
research
05/16/2019

Random Sampling for Distributed Coded Matrix Multiplication

Matrix multiplication is a fundamental building block for large scale co...
research
09/19/2023

Distributed Matrix Multiplication with a Smaller Recovery Threshold through Modulo-based Approaches

This paper considers the problem of calculating the matrix multiplicatio...
research
05/05/2021

ε-Approximate Coded Matrix Multiplication is Nearly Twice as Efficient as Exact Multiplication

We study coded distributed matrix multiplication from an approximate rec...
research
05/13/2021

Variable Coded Batch Matrix Multiplication

In this paper, we introduce the Variable Coded Distributed Batch Matrix ...
research
01/23/2019

Distributed and Private Coded Matrix Computation with Flexible Communication Load

Tensor operations, such as matrix multiplication, are central to large-s...
research
05/17/2021

Price of Precision in Coded Distributed Matrix Multiplication: A Dimensional Analysis

Coded distributed matrix multiplication (CDMM) schemes, such as MatDot c...

Please sign up or login with your details

Forgot password? Click here to reset