Scalable Querying of Nested Data

by   Jaclyn Smith, et al.

While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform non-trivial translations of collection programs or to employ automated flattening procedures, both of which lead to performance problems. These challenges only worsen for nested collections with skewed cardinalities, where both handcrafted rewriting and automated flattening are unable to enforce load balancing across partitions. In this work, we propose a framework that translates a program manipulating nested collections into a set of semantically equivalent shredded queries that can be efficiently evaluated. The framework employs a combination of query compilation techniques, an efficient data representation for nested collections, and automated skew-handling. We provide an extensive experimental evaluation, demonstrating significant improvements provided by the framework in diverse scenarios for nested collection programs.



page 30

page 36

page 37

page 39

page 40


Query Lifting: Language-integrated query for heterogeneous nested collections

Language-integrated query based on comprehension syntax is a powerful te...

Generating collection queries from proofs

Nested relations, built up from atomic types via tupling and set types, ...

Modular Synthesis of Divide-and-Conquer Parallelism for Nested Loops (Extended Version)

We propose a methodology for automatic generation of divide-and-conquer ...

On a conjecture by Ben-Akiva and Lerman about the nested logit model

We prove a conjecture of Ben-Akiva and Lerman (1985) regarding the rando...

Rumble: data independence when data is in a mess

This paper introduces Rumble, an engine that executes JSONiq queries on ...

Instantiation Schemes for Nested Theories

This paper investigates under which conditions instantiation-based proof...

Ensembles of Nested Dichotomies with Multiple Subset Evaluation

A system of nested dichotomies is a method of decomposing a multi-class ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.