A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake

04/24/2017
by   Berenger Bramas, et al.
0

The modern CPU's design, which is composed of hierarchical memory and SIMD/vectorization capability, governs the potential for algorithms to be transformed into efficient implementations. The release of the AVX-512 changed things radically, and motivated us to search for an efficient sorting algorithm that can take advantage of it. In this paper, we describe the best strategy we have found, which is a novel two parts hybrid sort, based on the well-known Quicksort algorithm. The central partitioning operation is performed by a new algorithm, and small partitions/arrays are sorted using a branch-free Bitonic-based sort. This study is also an illustration of how classical algorithms can be adapted and enhanced by the AVX-512 extension. We evaluate the performance of our approach on a modern Intel Xeon Skylake and assess the different layers of our implementation by sorting/partitioning integers, double floating-point numbers, and key/value pairs of integers. Our results demonstrate that our approach is faster than two libraries of reference: the GNU C++ sort algorithm by a speedup factor of 4, and the Intel IPP library by a speedup factor of 1.4.

READ FULL TEXT
research
04/24/2017

Fast Sorting Algorithms using AVX-512 on Intel Knights Landing

This paper describes fast sorting techniques using the recent AVX-512 in...
research
05/17/2021

A fast vectorized sorting implementation based on the ARM scalable vector extension (SVE)

The way developers implement their algorithms and how these implementati...
research
11/06/2015

Evaluation of the Intel Xeon Phi and NVIDIA K80 as accelerators for two-dimensional panel codes

To predict the properties of fluid flow over a solid geometry is an impo...
research
05/12/2022

Vectorized and performance-portable Quicksort

Recent works showed that implementations of Quicksort using vector CPU i...
research
03/03/2018

Histogram Sort with Sampling

To minimize data movement, state-of-the-art parallel sorting algorithms ...
research
03/02/2020

High Performance Parallel Sort for Shared and Distributed Memory MIMD

We present four high performance hybrid sorting methods developed for va...
research
01/13/2020

The Two-Pass Softmax Algorithm

The softmax (also called softargmax) function is widely used in machine ...

Please sign up or login with your details

Forgot password? Click here to reset