Probery: A Probability-based Incomplete Query Optimization for Big Data

01/01/2019
by   Jie Song, et al.
0

Nowadays, query optimization has been highly concerned in big data management, especially in NoSQL databases. Approximate queries boost query performance by loss of accuracy, for example, sampling approaches trade off query completeness for efficiency. Different from them, we propose an uncertainty of query completeness, called Probability of query Completeness (PC for short). PC refers to the possibility that query results contain all satisfied records. For example PC=0.95, it guarantees that there are no more than 5 incomplete queries among 100 ones, but not guarantees how incomplete they are. We trade off PC for query performance, and experiments show that a small loss of PC doubles query performance. The proposed Probery (PROBability-based data quERY) adopts the uncertainty of query completeness to accelerate OLTP queries. This paper illustrates the data and probability models, the probability based data placement and query processing, and the Apache Drill-based implementation of Probery. In experiments, we first prove that the percentage of complete queries is larger than the given PC confidence for various cases, namely that the PC guarantee is validate. Then Probery is compared with Drill, Impala and Hive in terms of query performance. The results indicate that Drill-based Probery performs as fast as Drill with complete query, while averagely 1.8x, 1.3x and 1.6x faster than Drill, Impala and Hive with possible complete query, respectively.

READ FULL TEXT

page 6

page 15

research
03/05/2020

LAQP: Learning-based Approximate Query Processing

Querying on big data is a challenging task due to the rapid growth of da...
research
08/02/2018

Diversification on Big Data in Query Processing

Recently, in the area of big data, some popular applications such as web...
research
10/13/2022

Soundness and Completeness of SPARQL Query Containment Solver SpeCS

Tool SPECS implements an efficient automated approach for reasoning abou...
research
07/22/2017

Possible and Certain Answers for Queries over Order-Incomplete Data

To combine and query ordered data from multiple sources, one needs to ha...
research
07/31/2018

Improve3C: Data Cleaning on Consistency and Completeness with Currency

Data quality plays a key role in big data management today. With the exp...
research
12/26/2018

QuickSel: Quick Selectivity Learning with Mixture Models

Estimating the selectivity of a query is a key step in almost any cost-b...
research
01/18/2014

Completeness Guarantees for Incomplete Ontology Reasoners: Theory and Practice

To achieve scalability of query answering, the developers of Semantic We...

Please sign up or login with your details

Forgot password? Click here to reset