Vectorized and performance-portable Quicksort

05/12/2022
by   Mark Blacher, et al.
0

Recent works showed that implementations of Quicksort using vector CPU instructions can outperform the non-vectorized algorithms in widespread use. However, these implementations are typically single-threaded, implemented for a particular instruction set, and restricted to a small set of key types. We lift these three restrictions: our proposed 'vqsort' algorithm integrates into the state-of-the-art parallel sorter 'ips4o', with a geometric mean speedup of 1.59. The same implementation works on seven instruction sets (including SVE and RISC-V V) across four platforms. It also supports floating-point and 16-128 bit integer keys. To the best of our knowledge, this is the fastest sort for non-tuple keys on CPUs, up to 20 times as fast as the sorting algorithms implemented in standard libraries. This paper focuses on the practical engineering aspects enabling the speed and portability, which we have not yet seen demonstrated for a Quicksort implementation. Furthermore, we introduce compact and transpose-free sorting networks for in-register sorting of small arrays, and a vector-friendly pivot sampling strategy that is robust against adversarial input.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2017

Fast Sorting Algorithms using AVX-512 on Intel Knights Landing

This paper describes fast sorting techniques using the recent AVX-512 in...
research
05/17/2021

A fast vectorized sorting implementation based on the ARM scalable vector extension (SVE)

The way developers implement their algorithms and how these implementati...
research
04/24/2017

A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake

The modern CPU's design, which is composed of hierarchical memory and SI...
research
08/21/2019

Engineering Faster Sorters for Small Sets of Items

Sorting a set of items is a task that can be useful by itself or as a bu...
research
04/20/2023

High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems

Semisort is a fundamental algorithmic primitive widely used in the desig...
research
06/03/2023

Optimized Vectorization Implementation of CRYSTALS-Dilithium

CRYSTALS-Dilithium is a lattice-based signature scheme to be standardize...
research
03/03/2018

Histogram Sort with Sampling

To minimize data movement, state-of-the-art parallel sorting algorithms ...

Please sign up or login with your details

Forgot password? Click here to reset