Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication

04/27/2018
by   Ankur Mallick, et al.
0

Large-scale machine learning and data mining applications require computer systems to perform massive computations that need to be parallelized across multiple nodes, for example, massive matrix-vector and matrix-matrix multiplication. The presence of straggling nodes -- computing nodes that unpredictably slowdown or fail -- is a major bottleneck in such distributed computations. We propose a rateless fountain coding strategy to alleviate the problem of stragglers in distributed matrix-vector multiplication. Our algorithm creates a stream of linear combinations of the m rows of the matrix, and assigns them to different worker nodes, which then perform row-vector products with the encoded rows. The original matrix-vector product can be decoded as soon as slightly more than m row-vector products are collectively finished by the nodes. This strategy enables fast nodes to steal work from slow nodes, without requiring the master to perform any dynamic load-balancing. Compared to recently proposed fixed-rate erasure coding strategies which ignore partial work done by straggling nodes, rateless coding achieves significantly lower overall delay, as well as small computational and decoding overhead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2019

Factored LT and Factored Raptor Codes for Large-Scale Distributed Matrix Multiplication

We propose two coding schemes for distributed matrix multiplication in t...
research
01/25/2019

Distributed Matrix-Vector Multiplication: A Convolutional Coding Approach

Distributed computing systems are well-known to suffer from the problem ...
research
09/26/2017

PMV: Pre-partitioned Generalized Matrix-Vector Multiplication for Scalable Graph Mining

How can we analyze enormous networks including the Web and social networ...
research
03/04/2021

Straggler Mitigation through Unequal Error Protection for Distributed Approximate Matrix Multiplication

Large-scale machine learning and data mining methods routinely distribut...
research
12/25/2021

On computing HITS ExpertRank via lumping the hub matrix

The dangling nodes is the nodes with no out-links in the web graph. It s...
research
01/30/2020

Weighted Gradient Coding with Leverage Score Sampling

A major hurdle in machine learning is scalability to massive datasets. A...
research
02/07/2021

Load balancing for distributed nonlocal models within asynchronous many-task systems

In this work, we consider the challenges of developing a distributed sol...

Please sign up or login with your details

Forgot password? Click here to reset