An Iterative Scheme for Leverage-based Approximate Aggregation

11/06/2017
by   Shanshan Han, et al.
0

Currently data explosion poses great challenges to approximate aggregation on efficiency and accuracy. To address this problem, we propose a novel approach to calculate aggregation answers in high accuracy using only a small share of data. We introduce leverages to reflect individual differences of samples from the statistical perspective. Two kinds of estimators, the leverage-based estimator and the sketch estimator (a "rough picture" of the aggregation answer), are in constraint relations and iteratively improved according to the actual conditions until their difference is below a threshold. Due to the iteration mechanism and the leverages, our approach achieves high accuracy. Moreover, some features, including not requiring recording sampled data and easy to extend to various execution modes (such as, the online mode), make our approach well suited to deal with big data. Experiments show that our approach has extraordinary performance, and when compared with the uniform sampling, our approach can achieve high-quality answers with only 1/3 of the same sample size.

READ FULL TEXT
research
01/08/2021

Approximate Query Processing for Group-By Queries based on Conditional Generative Models

The Group-By query is an important kind of query, which is common and wi...
research
12/13/2021

Scalable subsampling: computation, aggregation and inference

Subsampling is a general statistical method developed in the 1990s aimed...
research
06/16/2018

Efficient Crowdsourcing via Proxy Voting

Crowdsourcing platforms offer a way to label data by aggregating answers...
research
10/10/2016

Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data

Sketching techniques have become popular for scaling up machine learning...
research
09/23/2020

Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth

In this paper, we propose Ensemble Learning models to identify factors c...
research
07/29/2018

MISS: Finding Optimal Sample Sizes for Approximate Analytics

Nowadays, sampling-based Approximate Query Processing (AQP) is widely re...

Please sign up or login with your details

Forgot password? Click here to reset