Probabilistic Top-k Dominating Queries in Distributed Uncertain Databases (Technical Report)

05/10/2021
by   Niranjan Rai, et al.
0

In many real-world applications such as business planning and sensor data monitoring, one important, yet challenging, the task is to rank objects(e.g., products, documents, or spatial objects) based on their ranking scores and efficiently return those objects with the highest scores. In practice, due to the unreliability of data sources, many real-world objects often contain noises and are thus imprecise and uncertain. In this paper, we study the problem of probabilistic top-k dominating(PTD) query on such large-scale uncertain data in a distributed environment, which retrieves k uncertain objects from distributed uncertain databases(on multiple distributed servers), having the largest ranking scores with high confidences. In order to efficiently tackle the distributed PTD problem, we propose a MapReduce framework for processing distributed PTD queries over distributed uncertain databases. In this MapReduce framework, we design effective pruning strategies to filter out false alarms in the distributed setting, propose cost-model-based index distribution mechanisms over servers, and develop efficient distributed PTD query processing algorithms. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed distributed PTD approach on both real and synthetic data sets through various experimental settings.

READ FULL TEXT

page 5

page 25

research
09/24/2019

Skyline Queries Over Incomplete Data Streams (Technical Report)

Nowadays, efficient and effective processing over massive stream data ha...
research
12/12/2021

Probabilistic Counting in Uncertain Spatial Databases using Generating Functions

Location data is inherently uncertain for many reasons including 1) impr...
research
02/17/2023

Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data (Extended version)

Uncertainty arises naturally inmany application domains due to, e.g., da...
research
06/01/2019

Probabilistic Top-k Dominating Query Monitoring over Multiple Uncertain IoT Data Streams in Edge Computing Environments

Extracting the valuable features and information in Big Data has become ...
research
03/15/2021

Online Topic-Aware Entity Resolution Over Incomplete Data Streams (Technical Report)

In many real applications such as the data integration, social network a...
research
08/23/2019

Efficient Join Processing Over Incomplete Data Streams (Technical Report)

For decades, the join operator over fast data streams has always drawn m...
research
06/05/2023

Fast Search-By-Classification for Large-Scale Databases Using Index-Aware Decision Trees and Random Forests

The vast amounts of data collected in various domains pose great challen...

Please sign up or login with your details

Forgot password? Click here to reset