DeepAI AI Chat
Log In Sign Up

Processing Database Joins over a Shared-Nothing System of Multicore Machines

by   Abhirup Chakraborty, et al.
Association for Computing Machinery

To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper looks into the feasibility of scaling up such a shared-nothing system while processing a compute- and communication-intensive workload---processing distributed joins. By exploiting multiple processing cores within the individual machines, we implement a system to process database joins that parallelizes computation within each node, pipelines the computation with communication, parallelizes the communication by allowing multiple simultaneous data transfers (send/receive), and removes synchronization barriers (a scalability bottleneck in a distributed data processing system). Our experimental results show that using only four threads per node the framework achieves a 3.5x gains in intra-node performance while compared with a single-threaded counterpart. Moreover, with the join processing workload the cluster-wide performance (and speedup) is observed to be dictated by the intra-node computational loads; this property brings a near-linear speedup with increasing nodes in the system, a feature much desired in modern large-scale data processing system.


page 1

page 2

page 3

page 4


System G Distributed Graph Database

Motivated by the need to extract knowledge and value from interconnected...

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework (Extended Version)

Frequent itemset mining (FIM) is a highly computational and data intensi...

HTCondor data movement at 100 Gbps

HTCondor is a major workload management system used in distributed high ...

Timestamp tokens: a better coordination primitive for data-processing systems

Distributed data processing systems have advanced through models that ex...

The Specialized High-Performance Network on Anton 3

Molecular dynamics (MD) simulation, a computationally intensive method t...

GX-Plug: a Middleware for Plugging Accelerators to Distributed Graph Processing

Recently, research communities highlight the necessity of formulating a ...