Columnar Formats for Schemaless LSM-based Document Stores

11/22/2021
by   Wail Y. Alkowaileet, et al.
0

In the last decade, document store database systems have gained more traction for storing and querying large volumes of semi-structured data. However, the flexibility of the document stores' data models has limited their ability to store data in a columnar-major layout - making them less performant for analytical workloads than column store relational databases. In this paper, we propose several techniques based on piggy-backing on Log-Structured Merge (LSM) tree events and tailored to document stores to store document data in a columnar layout. We first extend the Dremel format, a popular on-disk columnar format for semi-structured data, to comply with document stores' flexible data model. We then introduce two columnar layouts for organizing and storing data in LSM-based storage. We also highlight the potential of using query compilation techniques for document stores, where values' types are known only at runtime. We have implemented and evaluated our techniques to measure their impact on storage, data ingestion, and query performance in Apache AsterixDB. Our experiments show significant performance gains, improving the query execution time by orders of magnitude while minimally impacting ingestion performance.

READ FULL TEXT
research
10/17/2019

An LSM-based Tuple Compaction Framework for Apache AsterixDB

Document database systems store self-describing records, such as JSON, "...
research
05/24/2016

Requirements for storing electrophysiology data

The purpose of this document is to specify the basic data types required...
research
11/01/2021

AutoShard – Declaratively Managing Hot Spot Data Objects in NoSQL Document Stores

NoSQL document stores are becoming increasingly popular as backends in w...
research
04/08/2020

The Effects of Different JSON Representations on Querying Knowledge Graphs

Knowledge Graphs (KGs) have emerged as the de-facto standard for modelin...
research
02/21/2018

RStore: A Distributed Multi-version Document Store

We address the problem of compactly storing a large number of versions (...
research
06/07/2019

Holistic evaluation of XML queries with structural preferences on an annotated strong dataguide

With the emergence of XML as de facto format for storing and exchanging ...

Please sign up or login with your details

Forgot password? Click here to reset