Probery: A Probability-based Incomplete Query Optimization for Big Data

by   Jie Song, et al.

Nowadays, query optimization has been highly concerned in big data management, especially in NoSQL databases. Approximate queries boost query performance by loss of accuracy, for example, sampling approaches trade off query completeness for efficiency. Different from them, we propose an uncertainty of query completeness, called Probability of query Completeness (PC for short). PC refers to the possibility that query results contain all satisfied records. For example PC=0.95, it guarantees that there are no more than 5 incomplete queries among 100 ones, but not guarantees how incomplete they are. We trade off PC for query performance, and experiments show that a small loss of PC doubles query performance. The proposed Probery (PROBability-based data quERY) adopts the uncertainty of query completeness to accelerate OLTP queries. This paper illustrates the data and probability models, the probability based data placement and query processing, and the Apache Drill-based implementation of Probery. In experiments, we first prove that the percentage of complete queries is larger than the given PC confidence for various cases, namely that the PC guarantee is validate. Then Probery is compared with Drill, Impala and Hive in terms of query performance. The results indicate that Drill-based Probery performs as fast as Drill with complete query, while averagely 1.8x, 1.3x and 1.6x faster than Drill, Impala and Hive with possible complete query, respectively.



There are no comments yet.


page 6

page 15


LAQP: Learning-based Approximate Query Processing

Querying on big data is a challenging task due to the rapid growth of da...

Diversification on Big Data in Query Processing

Recently, in the area of big data, some popular applications such as web...

QuickSel: Quick Selectivity Learning with Mixture Models

Estimating the selectivity of a query is a key step in almost any cost-b...

Possible and Certain Answers for Queries over Order-Incomplete Data

To combine and query ordered data from multiple sources, one needs to ha...

Improve3C: Data Cleaning on Consistency and Completeness with Currency

Data quality plays a key role in big data management today. With the exp...

Completeness Guarantees for Incomplete Ontology Reasoners: Theory and Practice

To achieve scalability of query answering, the developers of Semantic We...

To Ship or Not to (Function) Ship (Extended version)

Sampling is often used to reduce query latency for interactive big data ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.