Scalable Querying of Nested Data

11/12/2020
by   Jaclyn Smith, et al.
0

While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform non-trivial translations of collection programs or to employ automated flattening procedures, both of which lead to performance problems. These challenges only worsen for nested collections with skewed cardinalities, where both handcrafted rewriting and automated flattening are unable to enforce load balancing across partitions. In this work, we propose a framework that translates a program manipulating nested collections into a set of semantically equivalent shredded queries that can be efficiently evaluated. The framework employs a combination of query compilation techniques, an efficient data representation for nested collections, and automated skew-handling. We provide an extensive experimental evaluation, demonstrating significant improvements provided by the framework in diverse scenarios for nested collection programs.

READ FULL TEXT

page 30

page 36

page 37

page 39

page 40

research
01/11/2021

Query Lifting: Language-integrated query for heterogeneous nested collections

Language-integrated query based on comprehension syntax is a powerful te...
research
05/13/2020

Generating collection queries from proofs

Nested relations, built up from atomic types via tupling and set types, ...
research
07/20/2019

On a conjecture by Ben-Akiva and Lerman about the nested logit model

We prove a conjecture of Ben-Akiva and Lerman (1985) regarding the rando...
research
04/01/2019

Modular Synthesis of Divide-and-Conquer Parallelism for Nested Loops (Extended Version)

We propose a methodology for automatic generation of divide-and-conquer ...
research
10/25/2019

Rumble: data independence when data is in a mess

This paper introduces Rumble, an engine that executes JSONiq queries on ...
research
07/12/2022

Supercharging the APGAS Programming Model with Relocatable Distributed Collections

In this article we present our relocatable distributed collections libra...
research
09/08/2018

Ensembles of Nested Dichotomies with Multiple Subset Evaluation

A system of nested dichotomies is a method of decomposing a multi-class ...

Please sign up or login with your details

Forgot password? Click here to reset