DeepAI AI Chat
Log In Sign Up

Processing Database Joins over a Shared-Nothing System of Multicore Machines

04/25/2018
by   Abhirup Chakraborty, et al.
Association for Computing Machinery
0

To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper looks into the feasibility of scaling up such a shared-nothing system while processing a compute- and communication-intensive workload---processing distributed joins. By exploiting multiple processing cores within the individual machines, we implement a system to process database joins that parallelizes computation within each node, pipelines the computation with communication, parallelizes the communication by allowing multiple simultaneous data transfers (send/receive), and removes synchronization barriers (a scalability bottleneck in a distributed data processing system). Our experimental results show that using only four threads per node the framework achieves a 3.5x gains in intra-node performance while compared with a single-threaded counterpart. Moreover, with the join processing workload the cluster-wide performance (and speedup) is observed to be dictated by the intra-node computational loads; this property brings a near-linear speedup with increasing nodes in the system, a feature much desired in modern large-scale data processing system.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/08/2018

System G Distributed Graph Database

Motivated by the need to extract knowledge and value from interconnected...
10/22/2021

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework (Extended Version)

Frequent itemset mining (FIM) is a highly computational and data intensi...
07/08/2021

HTCondor data movement at 100 Gbps

HTCondor is a major workload management system used in distributed high ...
10/12/2022

Timestamp tokens: a better coordination primitive for data-processing systems

Distributed data processing systems have advanced through models that ex...
01/20/2022

The Specialized High-Performance Network on Anton 3

Molecular dynamics (MD) simulation, a computationally intensive method t...
03/24/2022

GX-Plug: a Middleware for Plugging Accelerators to Distributed Graph Processing

Recently, research communities highlight the necessity of formulating a ...