Fast Access to Columnar, Hierarchically Nested Data via Code Transformation

08/20/2017
by   Jim Pivarski, et al.
0

Big Data query systems represent data in a columnar format for fast, selective access, and in some cases (e.g. Apache Drill), perform calculations directly on the columnar data without row materialization, avoiding runtime costs. However, many analysis procedures cannot be easily or efficiently expressed as SQL. In High Energy Physics, the majority of data processing requires nested loops with complex dependencies. When faced with tasks like these, the conventional approach is to convert the columnar data back into an object form, usually with a performance price. This paper describes a new technique to transform procedural code so that it operates on hierarchically nested, columnar data natively, without row materialization. It can be viewed as a compiler pass on the typed abstract syntax tree, rewriting references to objects as columnar array lookups. We will also present performance comparisons between transformed code and conventional object-oriented code in a High Energy Physics context.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/09/2017

Code Generation Techniques for Raw Data Processing

The motivation of the current study was to design an algorithm that can ...
research
04/26/2021

Evaluating Query Languages and Systems for High-Energy Physics Data

In the domain of high-energy physics (HEP), query languages in general a...
research
10/25/2019

Rumble: data independence when data is in a mess

This paper introduces Rumble, an engine that executes JSONiq queries on ...
research
10/12/2020

Constant-delay enumeration algorithms for document spanners over nested documents

Some of the most relevant document schemas used online, such as XML and ...
research
01/27/2019

CRAQL: A Composable Language for Querying Source Code

This paper describes the design and implementation of CRAQL (Composable ...
research
10/25/2019

Selective Lambda Lifting

Lambda lifting is a well-known transformation, traditionally employed fo...
research
10/24/2017

High-Performance Code Generation though Fusion and Vectorization

We present a technique for automatically transforming kernel-based compu...

Please sign up or login with your details

Forgot password? Click here to reset