An LSM-based Tuple Compaction Framework for Apache AsterixDB

10/17/2019
by   Wail Y. Alkowaileet, et al.
0

Document database systems store self-describing records, such as JSON, "as-is" without requiring the users to pre-define a schema. This provides users with the flexibility to change the structure of incoming records without worrying about taking the system offline or hindering the performance of currently running queries. However, the flexibility of such systems does not come without a cost. The large amount of redundancy in the stored records can introduce an unnecessary storage overhead and impact query performance. Our focus in this paper is to address the storage overhead issue by introducing a tuple compactor framework that infers and extracts the schema from self-describing records during the data ingestion process. As many prominent document store systems, such as MongoDB and Couchbase, adopt Log Structured Merge (LSM) trees in their storage engines, our framework exploits LSM lifecycle events to piggyback the schema inference and extraction operations. We have implemented and empirically evaluated our approach to measure its impact on storage, data ingestion, and query performance in the context of Apache AsterixDB.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2021

Columnar Formats for Schemaless LSM-based Document Stores

In the last decade, document store database systems have gained more tra...
research
12/18/2018

Demonstration of a Multiresolution Schema Mapping System

Enterprise databases usually contain large and complex schemas. Authorin...
research
09/01/2021

MORTAL: A Tool of Automatically Designing Relational Storage Schemas for Multi-model Data through Reinforcement Learning

Considering relational databases having powerful capabilities in handlin...
research
09/07/2018

Hierarchical Characteristic Set Merging for Optimizing SPARQL Queries in Heterogeneous RDF

Characteristic sets (CS) organize RDF triples based on the set of proper...
research
08/05/2020

PrismDB: Read-aware Log-structured Merge Trees for Heterogeneous Storage

In recent years, emerging hardware storage technologies have focused on ...
research
10/02/2018

Heterogeneous Replica for Query on Cassandra

Cassandra is a popular structured storage system with high-performance, ...
research
11/04/2019

Incremental extraction of a NoSQL database model using an MDA-based process

In recent years, the need to use NoSQL systems to store and exploit big ...

Please sign up or login with your details

Forgot password? Click here to reset