Integrating Column-Oriented Storage and Query Processing Techniques Into Graph Database Management Systems

03/03/2021
by   Pranjal Gupta, et al.
0

We revisit column-oriented storage and query processing techniques in the context of contemporary graph database management systems (GDBMSs). Similar to column-oriented RDBMSs, GDBMSs support read-heavy analytical workloads that however have fundamentally different data access patterns than traditional analytical workloads. We first derive a set of desiderata for optimizing storage and query processors of GDBMS based on their access patterns. We then present the design of columnar storage, compression, and query processing techniques based on these desiderata. In addition to showing direct integration of existing techniques from columnar RDBMSs, we also propose novel ones that are optimized for GDBMSs. These include a novel list-based query processor, which avoids expensive data copies of traditional block-based processors under many-to-many joins and avoids materializing adjacency lists in intermediate tuples, a new data structure we call single-indexed edge property pages and an accompanying edge ID scheme, and a new application of Jacobson's bit vector index for compressing NULL values and empty lists. We integrated our techniques into the GraphflowDB in-memory GDBMS. Through extensive experiments, we demonstrate the scalability and query performance benefits of our techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2021

Revisiting Data Compression in Column-Stores

Data compression is widely used in contemporary column-oriented DBMSes t...
research
03/31/2020

A+ Indexes: Lightweight and Highly Flexible Adjacency Lists for Graph Database Management Systems

Graph database management systems (GDBMSs) are highly optimized to perfo...
research
03/07/2018

Fast in-database cross-matching of high-cadence, high-density source lists with an up-to-date sky model

Coming high-cadence wide-field optical telescopes will image hundreds of...
research
10/13/2019

LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans

The specific characteristics of graph workloads make it hard to design a...
research
06/19/2023

COLE: A Column-based Learned Storage for Blockchain Systems

Blockchain systems suffer from high storage costs as every node needs to...
research
12/26/2021

Airphant: Cloud-oriented Document Indexing

Modern data warehouses can scale compute nodes independently of storage....
research
04/29/2020

Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats

The proliferation of modern data processing tools has given rise to open...

Please sign up or login with your details

Forgot password? Click here to reset