Storyboard: Optimizing Precomputed Summaries for Aggregation

02/08/2020
by   Edward Gan, et al.
0

An emerging class of data systems partition their data and precompute approximate summaries (i.e., sketches and samples) for each segment to reduce query costs. They can then aggregate and combine the segment summaries to estimate results without scanning the raw data. However, given limited storage space each summary introduces approximation errors that affect query accuracy. For instance, systems that use existing mergeable summaries cannot reduce query error below the error of an individual precomputed summary. We introduce Storyboard, a query system that optimizes item frequency and quantile summaries for accuracy when aggregating over multiple segments. Compared to conventional mergeable summaries, Storyboard leverages additional memory available for summary construction and aggregation to derive a more precise combined result. This reduces error by up to 25x over interval aggregations and 4.4x over data cube aggregations on industrial datasets compared to standard summarization methods, with provable worst-case error guarantees.

READ FULL TEXT
research
05/09/2019

Tight Lower Bound for Comparison-Based Quantile Summaries

Quantiles, such as the median or percentiles, provide concise and useful...
research
03/06/2018

Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries

Interactive analytics increasingly involves querying for quantiles over ...
research
09/19/2021

CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization

We study generating abstractive summaries that are faithful and factuall...
research
05/25/2021

Providing Meaningful Data Summarizations Using Exemplar-based Clustering in Industry 4.0

Data summarizations are a valuable tool to derive knowledge from large d...
research
03/11/2023

Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs

Estimating quantiles, like the median or percentiles, is a fundamental t...
research
08/21/2019

GeoBlocks: A Query-Driven Storage Layout for Geospatial Data

City authorities need to analyze urban geospatial data to improve transp...
research
12/24/2021

Multi-relation Graph Summarization

Graph summarization is beneficial in a wide range of applications, such ...

Please sign up or login with your details

Forgot password? Click here to reset