DeepAI AI Chat
Log In Sign Up

Scalable Querying of Nested Data

by   Jaclyn Smith, et al.

While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform non-trivial translations of collection programs or to employ automated flattening procedures, both of which lead to performance problems. These challenges only worsen for nested collections with skewed cardinalities, where both handcrafted rewriting and automated flattening are unable to enforce load balancing across partitions. In this work, we propose a framework that translates a program manipulating nested collections into a set of semantically equivalent shredded queries that can be efficiently evaluated. The framework employs a combination of query compilation techniques, an efficient data representation for nested collections, and automated skew-handling. We provide an extensive experimental evaluation, demonstrating significant improvements provided by the framework in diverse scenarios for nested collection programs.


page 30

page 36

page 37

page 39

page 40


Query Lifting: Language-integrated query for heterogeneous nested collections

Language-integrated query based on comprehension syntax is a powerful te...

Generating collection queries from proofs

Nested relations, built up from atomic types via tupling and set types, ...

On a conjecture by Ben-Akiva and Lerman about the nested logit model

We prove a conjecture of Ben-Akiva and Lerman (1985) regarding the rando...

Modular Synthesis of Divide-and-Conquer Parallelism for Nested Loops (Extended Version)

We propose a methodology for automatic generation of divide-and-conquer ...

Rumble: data independence when data is in a mess

This paper introduces Rumble, an engine that executes JSONiq queries on ...

Supercharging the APGAS Programming Model with Relocatable Distributed Collections

In this article we present our relocatable distributed collections libra...

Ensembles of Nested Dichotomies with Multiple Subset Evaluation

A system of nested dichotomies is a method of decomposing a multi-class ...