In-Place Parallel-Partition Algorithms using Exclusive-Read-and-Write Memory: An In-Place Algorithm With Provably Optimal Cache Behavior

04/27/2020
by   William Kuszmaul, et al.
0

We present an in-place algorithm for the parallel partition problem that has linear work and polylogarithmic span. The algorithm uses only exclusive read/write shared variables, and can be implemented using parallel-for-loops without any additional concurrency considerations (i.e., the algorithm is EREW). A key feature of the algorithm is that it exhibits provably optimal cache behavior, up to small-order factors. We also present a second in-place EREW algorithm that has linear work and span O(log n ·loglog n), which is within an O(loglog n) factor of the optimal span. By using this low-span algorithm as a subroutine within the cache-friendly algorithm, we are able to obtain a single EREW algorithm that combines their theoretical guarantees: the algorithm achieves span O(log n ·loglog n) and optimal cache behavior. As an immediate consequence, we also get an in-place EREW quicksort algorithm with work O(n log n), span O(log^2 n ·loglog n). Whereas the standard EREW algorithm for parallel partitioning is memory-bandwidth bound on large numbers of cores, our cache-friendly algorithm is able to achieve near-ideal scaling in practice by avoiding the memory-bandwidth bottleneck. The algorithm's performance is comparable to that of the Blocked Strided Algorithm of Francis, Pannan, Frias, and Petit, which is the previous state-of-the art for parallel EREW sorting algorithms, but which lacks theoretical guarantees on its span and cache behavior.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2023

Cache-Oblivious Parallel Convex Hull in the Binary Forking Model

We present two cache-oblivious sorting-based convex hull algorithms in t...
research
06/28/2018

A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle

Task parallelism is designed to simplify the task of parallel programmin...
research
08/01/2020

Data Oblivious Algorithms for Multicores

As secure processors such as Intel SGX (with hyperthreading) become wide...
research
11/09/2021

Analysis of Work-Stealing and Parallel Cache Complexity

Parallelism has become extremely popular over the past decade, and there...
research
09/28/2020

Engineering In-place (Shared-memory) Sorting Algorithms

We present sorting algorithms that represent the fastest known technique...
research
10/31/2022

GPU-friendly, Parallel, and (Almost-)In-Place Construction of Left-Balanced k-d Trees

We present an algorithm that allows for building left-balanced and compl...
research
10/14/2020

Taurus: Lightweight Parallel Logging for In-Memory Database Management Systems (Extended Version)

Existing single-stream logging schemes are unsuitable for in-memory data...

Please sign up or login with your details

Forgot password? Click here to reset