Accelerating Big-Data Sorting Through Programmable Switches

03/25/2021
by   Yamit Barshatz-Schneor, et al.
0

Sorting is a fundamental and well studied problem that has been studied extensively. Sorting plays an important role in the area of databases, as many queries can be served much faster if the relations are first sorted. One of the most popular sorting algorithm in databases is merge sort. In modern data-centers, data is stored in storage servers, while processing takes place in compute servers. Hence, in order to compute queries on the data, it must travel through the network from the storage servers to the compute servers. This creates a potential for utilizing programmable switches to perform partial sorting in order to accelerate the sorting process at the server side. This is possible because, as mentioned above, data packets pass through the switch in any case on their way to the server. Alas, programmable switches offer a very restricted and non-intuitive programming model, which is why realizing this is not-trivial. We devised a novel partial sorting algorithm that fits the programming model and restrictions of programmable switches and can expedite merge sort at the server. We also utilize built-in parallelism in the switch to divide the data into sequential ranges. Thus, the server needs to sort each range separately and then concatenate them to one sorted stream. This way, the server needs to sort smaller sections and each of these sections is already partially sorted. Hence, the server does less work, and the access pattern becomes more virtual-memory friendly. We evaluated the performance improvements obtained when utilizing our partial sorting algorithm over several data stream compositions with various switch configurations. Our study exhibits an improvement of 20 run-time when using our approach compared to plain sorting on the original stream.

READ FULL TEXT

page 5

page 8

page 18

page 19

page 20

research
08/02/2018

Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools

This dissertation focuses on two fundamental sorting problems: string so...
research
06/09/2020

Parking Packet Payload with P4

Network Function (NF) deployments suffer from poor link goodput, because...
research
04/14/2020

Comparisons of Algorithms in Big Data Processing

Parallel computing is the fundamental base for MapReduce framework in Ha...
research
04/21/2023

Faster Prefix-Sorting Algorithms for Deterministic Finite Automata

Sorting is a fundamental algorithmic pre-processing technique which ofte...
research
06/01/2022

P4DB – The Case for In-Network OLTP (Extended Technical Report)

In this paper we present a new approach for distributed DBMSs called P4D...
research
07/03/2021

Recombinant Sort: N-Dimensional Cartesian Spaced Algorithm Designed from Synergetic Combination of Hashing, Bucket, Counting and Radix Sort

Sorting is an essential operation which is widely used and is fundamenta...
research
03/01/2019

ε-differential agreement: A Parallel Data Sorting Mechanism for Distributed Information Processing System

The order of the input information plays a very important role in a dist...

Please sign up or login with your details

Forgot password? Click here to reset