Log In Sign Up

In-Place Parallel-Partition Algorithms using Exclusive-Read-and-Write Memory: An In-Place Algorithm With Provably Optimal Cache Behavior

by   William Kuszmaul, et al.

We present an in-place algorithm for the parallel partition problem that has linear work and polylogarithmic span. The algorithm uses only exclusive read/write shared variables, and can be implemented using parallel-for-loops without any additional concurrency considerations (i.e., the algorithm is EREW). A key feature of the algorithm is that it exhibits provably optimal cache behavior, up to small-order factors. We also present a second in-place EREW algorithm that has linear work and span O(log n ·loglog n), which is within an O(loglog n) factor of the optimal span. By using this low-span algorithm as a subroutine within the cache-friendly algorithm, we are able to obtain a single EREW algorithm that combines their theoretical guarantees: the algorithm achieves span O(log n ·loglog n) and optimal cache behavior. As an immediate consequence, we also get an in-place EREW quicksort algorithm with work O(n log n), span O(log^2 n ·loglog n). Whereas the standard EREW algorithm for parallel partitioning is memory-bandwidth bound on large numbers of cores, our cache-friendly algorithm is able to achieve near-ideal scaling in practice by avoiding the memory-bandwidth bottleneck. The algorithm's performance is comparable to that of the Blocked Strided Algorithm of Francis, Pannan, Frias, and Petit, which is the previous state-of-the art for parallel EREW sorting algorithms, but which lacks theoretical guarantees on its span and cache behavior.


page 1

page 2

page 3

page 4


Data Oblivious Algorithms for Multicores

As secure processors such as Intel SGX (with hyperthreading) become wide...

A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle

Task parallelism is designed to simplify the task of parallel programmin...

Analysis of Work-Stealing and Parallel Cache Complexity

Parallelism has become extremely popular over the past decade, and there...

Engineering In-place (Shared-memory) Sorting Algorithms

We present sorting algorithms that represent the fastest known technique...

GPU-friendly, Parallel, and (Almost-)In-Place Construction of Left-Balanced k-d Trees

We present an algorithm that allows for building left-balanced and compl...

Taurus: Lightweight Parallel Logging for In-Memory Database Management Systems (Extended Version)

Existing single-stream logging schemes are unsuitable for in-memory data...

Optimal Multithreaded Batch-Parallel 2-3 Trees

This paper presents a batch-parallel 2-3 tree T in the asynchronous PPM ...