JAMPI: efficient matrix multiplication in Spark using Barrier Execution Mode

06/27/2020
by   Tamas Foldi, et al.
0

The new barrier mode in Apache Spark allows embedding distributed deep learning training as a Spark stage to simplify the distributed training workflow. In Spark, a task in a stage does not depend on any other tasks in the same stage, and hence it can be scheduled independently. However, several algorithms require more sophisticated inter-task communications, similar to the MPI paradigm. By combining distributed message passing (using asynchronous network IO), OpenJDK's new auto-vectorization and Spark's barrier execution mode, we can add non-map/reduce based algorithms, such as Cannon's distributed matrix multiplication to Spark. We document an efficient distributed matrix multiplication using Cannon's algorithm, which improves significantly on the performance of the existing MLlib implementation. Used within a barrier task, the algorithm described herein results in an up to 24 percent performance increase on a 10,000x10,000 square matrix with a significantly lower memory footprint. Applications of efficient matrix multiplication include, among others, accelerating the training and implementation of deep convolutional neural network based workloads, and thus such efficient algorithms can play a ground-breaking role in faster, more efficient execution of even the most complicated machine learning tasks.

READ FULL TEXT
research
05/27/2021

Efficient distributed algorithms for Convolutional Neural Networks

Several efficient distributed algorithms have been developed for matrix-...
research
08/17/2022

AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs

In recent years, general matrix-matrix multiplication with non-regular-s...
research
11/18/2018

Stark: Fast and Scalable Strassen's Matrix Multiplication using Apache Spark

This paper presents a new fast, highly scalable distributed matrix multi...
research
03/06/2020

Barriers for rectangular matrix multiplication

We study the algorithmic problem of multiplying large matrices that are ...
research
06/25/2020

Constant-Depth and Subcubic-Size Threshold Circuits for Matrix Multiplication

Boolean circuits of McCulloch-Pitts threshold gates are a classic model ...
research
03/20/2020

Localized sketching for matrix multiplication and ridge regression

We consider sketched approximate matrix multiplication and ridge regress...
research
05/05/2019

MapReduce Meets Fine-Grained Complexity: MapReduce Algorithms for APSP, Matrix Multiplication, 3-SUM, and Beyond

Distributed processing frameworks, such as MapReduce, Hadoop, and Spark ...

Please sign up or login with your details

Forgot password? Click here to reset