Making RDBMSs Efficient on Graph Workloads Through Predefined Joins

08/24/2021
by   Guodong Jin, et al.
0

Joins in native graph database management systems (GDBMSs) are predefined to the system as edges, which are indexed in adjacency list indices and serve as pointers. This contrasts with and can be more performant than value-based joins in RDBMSs and has lead researchers to investigate ways to integrate predefined joins directly into RDBMSs. Existing approaches adopt a strict separation of graph and relational data and processors, where a graph-specific processor uses left-deep and index nested loop joins for a subset of joins. This may be suboptimal, and may lead to non-sequential scans of data in some queries. We propose a purely relational approach to integrate predefined joins in columnar RDBMSs that uses row IDs (RIDs) of tuples as pointers. Users can predefine equality joins between any two tables, which leads to materializing RIDs in extended tables and optionally in RID indices. Instead of using the RID index to perform the join directly, we use it primarily in hash joins to generate semi-join filters that can be passed to scans using sideways information passing, ensuring sequential scans. In some settings, we also use RID indices to reduce the number of joins in query plans. Our approach does not introduce any graph-specific system components, can execute predefined joins on any join plan, and can improve performance on any workload that contains equality joins that can be predefined. We integrated our approach to DuckDB and call the resulting system GRainDB. We demonstrate that GRainDB far improves the performance of DuckDB on relational and graph workloads with large many-to-many joins, making it competitive with a state-of-the-art GDBMS, and incurs no major overheads otherwise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2019

LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans

The specific characteristics of graph workloads make it hard to design a...
research
03/31/2020

A+ Indexes: Lightweight and Highly Flexible Adjacency Lists for Graph Database Management Systems

Graph database management systems (GDBMSs) are highly optimized to perfo...
research
01/19/2023

Work-Efficient Query Evaluation with PRAMs

The paper studies query evaluation in parallel constant time in the PRAM...
research
11/16/2013

The Optimization of Running Queries in Relational Databases Using ANT-Colony Algorithm

The issue of optimizing queries is a cost-sensitive process and with res...
research
02/28/2022

Efficient Massively Parallel Join Optimization for Large Queries

Modern data analytical workloads often need to run queries over a large ...
research
03/01/2019

Parallel Index-based Stream Join on a Multicore CPU

There is increasing interest in using multicore processors to accelerate...
research
06/08/2017

Optimal parameters for bloom-filtered joins in Spark

In this paper, we present an algorithm that joins relational database ta...

Please sign up or login with your details

Forgot password? Click here to reset