Leyenda: An Adaptive, Hybrid Sorting Algorithm for Large Scale Data with Limited Memory

09/17/2019
by   Yuanjing Shi, et al.
0

Sorting is the one of the fundamental tasks of modern data management systems. With Disk I/O being the most-accused performance bottleneck and more computation-intensive workloads, it has come to our attention that in heterogeneous environment, performance bottleneck may vary among different infrastructure. As a result, sort kernels need to be adaptive to changing hardware conditions. In this paper, we propose Leyenda, a hybrid, parallel and efficient Radix Most-Significant-Bit (MSB) MergeSort algorithm, with utilization of local thread-level CPU cache and efficient disk/memory I/O. Leyenda is capable of performing either internal or external sort efficiently, based on different I/O and processing conditions. We benchmarked Leyenda with three different workloads from Sort Benchmark, targeting three unique use cases, including internal, partially in-memory and external sort, and we found Leyenda to outperform GNU's parallel in-memory quick/merge sort implementations by up to three times. Leyenda is also ranked the second best external sort algorithm on ACM 2019 SIGMOD programming contest and forth overall.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

08/02/2018

Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools

This dissertation focuses on two fundamental sorting problems: string so...
10/29/2018

Sesquickselect: One and a half pivots for cache-efficient selection

Because of unmatched improvements in CPU performance, memory transfers h...
10/27/2017

External Memory Pipelining Made Easy With TPIE

When handling large datasets that exceed the capacity of the main memory...
03/02/2020

High Performance Parallel Sort for Shared and Distributed Memory MIMD

We present four high performance hybrid sorting methods developed for va...
10/05/2020

Performance Analysis of Traditional and Data-Parallel Primitive Implementations of Visualization and Analysis Kernels

Measurements of absolute runtime are useful as a summary of performance ...
02/24/2019

clusterNOR: A NUMA-Optimized Clustering Framework

Clustering algorithms are iterative and have complex data access pattern...
02/15/2022

Fast and Scalable Memristive In-Memory Sorting with Column-Skipping Algorithm

Memristive in-memory sorting has been proposed recently to improve hardw...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.