MapReduce Meets Fine-Grained Complexity: MapReduce Algorithms for APSP, Matrix Multiplication, 3-SUM, and Beyond

05/05/2019
by   MohammadTaghi Hajiaghayi, et al.
0

Distributed processing frameworks, such as MapReduce, Hadoop, and Spark are popular systems for processing large amounts of data. The design of efficient algorithms in these frameworks is a challenging problem, as the systems both require parallelism---since datasets are so large that multiple machines are necessary---and limit the degree of parallelism---since the number of machines grows sublinearly in the size of the data. Although MapReduce is over a dozen years old dean2008mapreduce, many fundamental problems, such as Matrix Multiplication, 3-SUM, and All Pairs Shortest Paths, lack efficient MapReduce algorithms. We study these problems in the MapReduce setting. Our main contribution is to exhibit smooth trade-offs between the memory available on each machine, and the total number of machines necessary for each problem. Overall, we take the memory available to each machine as a parameter, and aim to minimize the number of rounds and number of machines. In this paper, we build on the well-known MapReduce theoretical framework initiated by Karloff, Suri, and Vassilvitskii karloff2010model and give algorithms for many of these problems. The key to efficient algorithms in this setting lies in defining a sublinear number of large (polynomially sized) subproblems, that can then be solved in parallel. We give strategies for MapReduce-friendly partitioning, that result in new algorithms for all of the above problems. Specifically, we give constant round algorithms for the Orthogonal Vector (OV) and 3-SUM problems, and O( n)-round algorithms for Matrix Multiplication, All Pairs Shortest Paths (APSP), and Fast Fourier Transform (FFT), among others. In all of these we exhibit trade-offs between the number of machines and memory per machine.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2021

Efficient distributed algorithms for Convolutional Neural Networks

Several efficient distributed algorithms have been developed for matrix-...
research
04/16/2020

Centralized and Parallel Multi-Source Shortest Paths via Hopsets and Fast Matrix Multiplication

Consider an undirected weighted graph G = (V,E,w). We study the problem ...
research
08/25/2023

SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix Multiplication

Sparse generalized matrix-matrix multiplication (SpGEMM) is a fundamenta...
research
01/14/2020

Efficient parameterized algorithms for computing all-pairs shortest paths

Computing all-pairs shortest paths is a fundamental and much-studied pro...
research
09/22/2016

Scaling betweenness centrality using communication-efficient sparse matrix multiplication

Betweenness centrality (BC) is a crucial graph problem that measures the...
research
06/27/2020

JAMPI: efficient matrix multiplication in Spark using Barrier Execution Mode

The new barrier mode in Apache Spark allows embedding distributed deep l...
research
08/30/2020

Low-Depth Parallel Algorithms for the Binary-Forking Model without Atomics

The binary-forking model is a parallel computation model, formally defin...

Please sign up or login with your details

Forgot password? Click here to reset