Hybrid Materialization in a Disk-Based Column-Store

04/17/2023
by   Evgeniy Klyuchikov, et al.
0

In column-oriented query processing, a materialization strategy determines when lightweight positions (row IDs) are translated into tuples. It is an important part of column-store architecture, since it defines the class of supported query plans, and, therefore, impacts the overall system performance. In this paper we continue investigating materialization strategies for a distributed disk-based column-store. We start with demonstrating cases when existing approaches impose fundamental limitations on the resulting system performance. Then, in order to address them, we propose a new hybrid materialization model. The main feature of hybrid materialization is the ability to manipulate both positions and values at the same time. This way, query engine can flexibly combine advantages of all the existing strategies and support a new class of query plans. Moreover, hybrid materialization allows the query engine to flexibly customize the materialization policy of individual attributes. We describe our vision of how hybrid materialization can be implemented in a columnar system. As an example, we use PosDB  – a distributed, disk-based column-store. We present necessary data structures, the internals of a hybrid operator, and describe the algebra of such operators. Based on this implementation, we evaluate performance of late, ultra-late, and hybrid materialization strategies in several scenarios based on TPC-H queries. Our experiments demonstrate that hybrid materialization is almost two times faster than its counterparts, while providing a more flexible query model.

READ FULL TEXT

page 1

page 5

page 9

research
08/16/2023

Finding a Second Wind: Speeding Up Graph Traversal Queries in RDBMSs Using Column-Oriented Processing

Recursive queries and recursive derived tables constitute an important p...
research
04/27/2019

A computational model for analytic column stores

This work presents an abstract model for the computations performed by a...
research
04/08/2020

A Comparative Analysis of Knowledge Graph Query Performance

As Knowledge Graphs (KGs) continue to gain widespread momentum for use i...
research
04/20/2020

MorphStore: Analytical Query Engine with a Holistic Compression-Enabled Processing Model

In this paper, we present MorphStore, an open-source in-memory columnar ...
research
01/17/2021

Real-Time LSM-Trees for HTAP Workloads

Real-time data analytics systems such as SAP HANA, MemSQL, and IBM Wildf...
research
06/18/2017

Evolutionary Data Systems

Anyone in need of a data system today is confronted with numerous comple...
research
09/06/2022

An Adaptive Column Compression Family for Self-Driving Databases

Modern in-memory databases are typically used for high-performance workl...

Please sign up or login with your details

Forgot password? Click here to reset