Log In Sign Up

Offset-value coding in database query processing

by   Goetz Graefe, et al.

Recent work shows how offset-value coding speeds up database query execution, not only sorting but also duplicate removal and grouping(aggregation) in sorted streams, order-preserving exchange (shuffle), merge join, and more. It already saves thousands of CPUs in Google's Napa and F1 Query systems, e.g., in grouping algorithms and in log-structured merge-forests. In order to realize the full benefit of interesting orderings, however, query execution algorithms must not only consume and exploit offset-value codes but also produce offset-value codes for the next operator in the pipeline. Our research has investigated ways to produce offset-value codes without comparing successive output rows column-by-column. This short paper introduces a new theorem and, based on its proof and a simple corollary, describes in detail how order-preserving algorithms (from filter to merge join and even shuffle) can compute offset-value codes for their outputs. These calculations are surprisingly simple and very efficient.


page 1

page 2

page 3

page 4


Robust and Efficient Sorting with Offset-Value Coding

Sorting and searching are large parts of database query processing, e.g....

Sort-based grouping and aggregation

Database query processing requires algorithms for duplicate removal, gro...

SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning

SkinnerDB is designed from the ground up for reliable join ordering. It ...

Non-recursive Approach for Sort-Merge Join Operation

Several algorithms have been developed over the years to perform join op...

Covering the Relational Join

In this paper, we initiate a theoretical study of what we call the join ...

Scalable Relational Query Processing on Big Matrix Data

The use of large-scale machine learning methods is becoming ubiquitous i...

Efficiently Charting RDF

We propose a visual query language for interactively exploring large-sca...