Straggler Mitigation through Unequal Error Protection for Distributed Approximate Matrix Multiplication

03/04/2021
by   Busra Tegin, et al.
0

Large-scale machine learning and data mining methods routinely distribute computations across multiple agents to parallelize processing. The time required for the computations at the agents is affected by the availability of local resources giving rise to the "straggler problem". As a remedy to this problem, linear coding of the matrix sub-blocks can be used, i.e., the Parameter Server (PS) utilizes a channel code to encode the matrix sub-blocks and distributes these matrices to the workers for multiplication. In this paper, we employ Unequal Error Protection (UEP) codes to obtain an approximation of the matrix product in the distributed computation setting in the presence of stragglers. The resiliency level of each sub-block is chosen according to its norm, as blocks with larger norms have higher effects on the result of the matrix multiplication. In particular, we consider two approaches in distributing the matrix computation: (i) a row-times-column paradigm, and (ii) a column-times-row paradigm. For both paradigms, we characterize the performance of the proposed approach from a theoretical perspective by bounding the expected reconstruction error for matrices with uncorrelated entries. We also apply the proposed coding strategy to the computation of the back-propagation step in the training of a Deep Neural Network (DNN) for an image classification task in the evaluation of the gradient during back-propagation. Our numerical experiments show that it is indeed possible to obtain significant improvements in the overall time required to achieve the DNN training convergence by producing matrix product approximations using UEP codes.

READ FULL TEXT

page 1

page 3

page 7

page 13

research
11/05/2020

Straggler Mitigation through Unequal Error Protection for Distributed Matrix Multiplication

Large-scale machine learning and data mining methods routinely distribut...
research
05/12/2022

Sparse Random Khatri-Rao Product Codes for Distributed Matrix Multiplication

We introduce two generalizations to the paradigm of using Random Khatri-...
research
04/27/2018

Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication

Large-scale machine learning and data mining applications require comput...
research
05/05/2021

ε-Approximate Coded Matrix Multiplication is Nearly Twice as Efficient as Exact Multiplication

We study coded distributed matrix multiplication from an approximate rec...
research
11/06/2018

Erasure coding for distributed matrix multiplication for matrices with bounded entries

Distributed matrix multiplication is widely used in several scientific d...
research
01/20/2020

Bivariate Polynomial Coding for Exploiting Stragglers in Heterogeneous Coded Computing Systems

Polynomial coding has been proposed as a solution to the straggler mitig...
research
01/31/2021

Linear Computation Coding

We introduce the new concept of computation coding. Similar to how rate-...

Please sign up or login with your details

Forgot password? Click here to reset