Leyenda: An Adaptive, Hybrid Sorting Algorithm for Large Scale Data with Limited Memory

09/17/2019
by   Yuanjing Shi, et al.
0

Sorting is the one of the fundamental tasks of modern data management systems. With Disk I/O being the most-accused performance bottleneck and more computation-intensive workloads, it has come to our attention that in heterogeneous environment, performance bottleneck may vary among different infrastructure. As a result, sort kernels need to be adaptive to changing hardware conditions. In this paper, we propose Leyenda, a hybrid, parallel and efficient Radix Most-Significant-Bit (MSB) MergeSort algorithm, with utilization of local thread-level CPU cache and efficient disk/memory I/O. Leyenda is capable of performing either internal or external sort efficiently, based on different I/O and processing conditions. We benchmarked Leyenda with three different workloads from Sort Benchmark, targeting three unique use cases, including internal, partially in-memory and external sort, and we found Leyenda to outperform GNU's parallel in-memory quick/merge sort implementations by up to three times. Leyenda is also ranked the second best external sort algorithm on ACM 2019 SIGMOD programming contest and forth overall.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2018

Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools

This dissertation focuses on two fundamental sorting problems: string so...
research
10/29/2018

Sesquickselect: One and a half pivots for cache-efficient selection

Because of unmatched improvements in CPU performance, memory transfers h...
research
10/27/2017

External Memory Pipelining Made Easy With TPIE

When handling large datasets that exceed the capacity of the main memory...
research
05/08/2023

Parallel External Sorting of ASCII Records Using Learned Models

External sorting is at the core of many operations in large-scale databa...
research
10/05/2020

Performance Analysis of Traditional and Data-Parallel Primitive Implementations of Visualization and Analysis Kernels

Measurements of absolute runtime are useful as a summary of performance ...
research
02/15/2022

Fast and Scalable Memristive In-Memory Sorting with Column-Skipping Algorithm

Memristive in-memory sorting has been proposed recently to improve hardw...
research
02/24/2019

clusterNOR: A NUMA-Optimized Clustering Framework

Clustering algorithms are iterative and have complex data access pattern...

Please sign up or login with your details

Forgot password? Click here to reset