Engineering In-place (Shared-memory) Sorting Algorithms

09/28/2020
by   Michael Axtmann, et al.
0

We present sorting algorithms that represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. A part of the speed advantage is due to the feature to work in-place. Previously, the in-place feature often implied performance penalties. Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably cache-efficient. We also parallelize this approach taking dynamic load balancing and memory locality into account. Our comparison-based algorithm, In-place Superscalar Samplesort (IPS^4o), combines this technique with branchless decision trees. By taking cases with many equal elements into account and by adapting the distribution degree dynamically, we obtain a highly robust algorithm that outperforms the best in-place parallel comparison-based competitor by almost a factor of three. IPS^4o also outperforms the best comparison-based competitors in the in-place or not in-place, parallel or sequential settings. IPS^4o even outperforms the best integer sorting algorithms in a wide range of situations. In many of the remaining cases (often involving near-uniform input distributions, small keys, or a sequential setting), our new in-place radix sorter turns out to be the best algorithm. Claims to have the, in some sense, "best" sorting algorithm can be found in many papers which cannot all be true. Therefore, we base our conclusions on extensive experiments involving a large part of the cross product of 21 state-of-the-art sorting codes, 6 data types, 10 input distributions, 4 machines, 4 memory allocation strategies, and input sizes varying over 7 orders of magnitude. This confirms the robust performance of our algorithms while revealing major performance problems in many competitors outside the concrete set of measurements reported in the associated publications.

READ FULL TEXT
research
08/14/2022

Towards Parallel Learned Sorting

We introduce a new sorting algorithm that is the combination of ML-enhan...
research
04/20/2023

High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems

Semisort is a fundamental algorithmic primitive widely used in the desig...
research
04/27/2020

In-Place Parallel-Partition Algorithms using Exclusive-Read-and-Write Memory: An In-Place Algorithm With Provably Optimal Cache Behavior

We present an in-place algorithm for the parallel partition problem that...
research
11/03/2018

QuickXsort - A Fast Sorting Scheme in Theory and Practice

QuickXsort is a highly efficient in-place sequential sorting scheme that...
research
07/05/2021

Defeating duplicates: A re-design of the LearnedSort algorithm

LearnedSort is a novel sorting algorithm that, unlike traditional method...
research
08/21/2019

Engineering Faster Sorters for Small Sets of Items

Sorting a set of items is a task that can be useful by itself or as a bu...
research
04/12/2017

Parallelized Kendall's Tau Coefficient Computation via SIMD Vectorized Sorting On Many-Integrated-Core Processors

Pairwise association measure is an important operation in data analytics...

Please sign up or login with your details

Forgot password? Click here to reset