Scalable Querying of Nested Data

11/12/2020
by   Jaclyn Smith, et al.
0

While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform non-trivial translations of collection programs or to employ automated flattening procedures, both of which lead to performance problems. These challenges only worsen for nested collections with skewed cardinalities, where both handcrafted rewriting and automated flattening are unable to enforce load balancing across partitions. In this work, we propose a framework that translates a program manipulating nested collections into a set of semantically equivalent shredded queries that can be efficiently evaluated. The framework employs a combination of query compilation techniques, an efficient data representation for nested collections, and automated skew-handling. We provide an extensive experimental evaluation, demonstrating significant improvements provided by the framework in diverse scenarios for nested collection programs.

READ FULL TEXT

Authors

page 30

page 36

page 37

page 39

page 40

01/11/2021

Query Lifting: Language-integrated query for heterogeneous nested collections

Language-integrated query based on comprehension syntax is a powerful te...
05/13/2020

Generating collection queries from proofs

Nested relations, built up from atomic types via tupling and set types, ...
04/01/2019

Modular Synthesis of Divide-and-Conquer Parallelism for Nested Loops (Extended Version)

We propose a methodology for automatic generation of divide-and-conquer ...
07/20/2019

On a conjecture by Ben-Akiva and Lerman about the nested logit model

We prove a conjecture of Ben-Akiva and Lerman (1985) regarding the rando...
10/25/2019

Rumble: data independence when data is in a mess

This paper introduces Rumble, an engine that executes JSONiq queries on ...
07/25/2011

Instantiation Schemes for Nested Theories

This paper investigates under which conditions instantiation-based proof...
09/08/2018

Ensembles of Nested Dichotomies with Multiple Subset Evaluation

A system of nested dichotomies is a method of decomposing a multi-class ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.