Low-Depth Parallel Algorithms for the Binary-Forking Model without Atomics

08/30/2020
by   Zafar Ahmad, et al.
0

The binary-forking model is a parallel computation model, formally defined by Blelloch et al. very recently, in which a thread can fork a concurrent child thread, recursively and asynchronously. The model incurs a cost of Θ(log n) to spawn or synchronize n tasks or threads. The binary-forking model realistically captures the performance of parallel algorithms implemented using modern multithreaded programming languages on multicore shared-memory machines. In contrast, the widely studied theoretical PRAM model does not consider the cost of spawning and synchronizing threads, and as a result, algorithms achieving optimal performance bounds in the PRAM model may not be optimal in the binary-forking model. Often, algorithms need to be redesigned to achieve optimal performance bounds in the binary-forking model and the non-constant synchronization cost makes the task challenging. Though the binary-forking model allows the use of atomic test-and-set (TS) instructions to reduce some synchronization overhead, assuming the availability of such instructions puts a stronger requirement on the hardware and may limit the portability of the algorithms using them. In this paper, we avoid the use of locks and atomic instructions in our algorithms except possibly inside the join operation which is implemented by the runtime system. In this paper, we design efficient parallel algorithms in the binary-forking model without atomics for three fundamental problems: Strassen's (and Strassen-like) matrix multiplication (MM), comparison-based sorting, and the Fast Fourier Transform (FFT). All our results improve over known results for the corresponding problem in the binary-forking model both with and without atomics.

READ FULL TEXT

page 7

page 19

page 22

page 33

page 40

research
03/11/2019

Optimal Parallel Algorithms in the Binary-Forking Model

In this paper we develop optimal algorithms in the binary-forking model ...
research
08/01/2020

Data Oblivious Algorithms for Multicores

As secure processors such as Intel SGX (with hyperthreading) become wide...
research
04/07/2021

A matrix math facility for Power ISA(TM) processors

Power ISA(TM) Version 3.1 has introduced a new family of matrix math ins...
research
08/10/2020

Fully Read/Write Fence-Free Work-Stealing with Multiplicity

Work-stealing is a popular technique to implement dynamic load balancing...
research
06/07/2019

Lightweight Parallel Foundations: a model-compliant communication layer

We present the Lightweight Parallel Foundations (LPF), an interoperable ...
research
05/05/2019

MapReduce Meets Fine-Grained Complexity: MapReduce Algorithms for APSP, Matrix Multiplication, 3-SUM, and Beyond

Distributed processing frameworks, such as MapReduce, Hadoop, and Spark ...
research
12/26/2019

Strategies for the vectorized Block Conjugate Gradients method

Block Krylov methods have recently gained a lot of attraction. Due to th...

Please sign up or login with your details

Forgot password? Click here to reset