Implementing the Comparison-Based External Sort

07/26/2022
by   Michael Polyntsov, et al.
0

In the age of big data, sorting is an indispensable operation for DBMSes and similar systems. Having data sorted can help produce query plans with significantly lower run times. It also can provide other benefits like having non-blocking operators which will produce data steadily (without bursts), or operators with reduced memory footprint. Sorting may be required on any step of query processing, i.e., be it source data or intermediate results. At the same time, the data to be sorted may not fit into main memory. In this case, an external sort operator, which writes intermediate results to disk, should be used. In this paper we consider an external sort operator of the comparison-based sort type. We discuss its implementation and describe related design decisions. Our aim is to study the impact on performance of a data structure used on the merge step. For this, we have experimentally evaluated three data structures implemented inside a DBMS. Results have shown that it is worthwhile to make an effort to implement an efficient data structure for run merging, even on modern commodity computers which are usually disk-bound. Moreover, we demonstrated that using a loser tree is a more efficient approach than both the naive approach and the heap-based one.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2018

Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools

This dissertation focuses on two fundamental sorting problems: string so...
research
02/03/2020

To pipeline or not to pipeline, that is the question

In designing query processing primitives, a crucial design choice is the...
research
04/19/2023

Tutorial: The Ubiquitous Skiplist, its Variants, and Applications in Modern Big Data Systems

The Skiplist, or skip list, originally designed as an in-memory data str...
research
03/15/2019

Dynamic Planar Point Location in External Memory

In this paper we describe a fully-dynamic data structure for the planar ...
research
11/04/2018

Lower Bounds for External Memory Integer Sorting via Network Coding

Sorting extremely large datasets is a frequently occuring task in practi...
research
08/29/2023

A Task-Parallel Approach for Localized Topological Data Structures

Unstructured meshes are characterized by data points irregularly distrib...
research
12/14/2022

An Efficient Incremental Simple Temporal Network Data Structure for Temporal Planning

One popular technique to solve temporal planning problems consists in de...

Please sign up or login with your details

Forgot password? Click here to reset