Online Topic-Aware Entity Resolution Over Incomplete Data Streams (Technical Report)

03/15/2021
by   Weilong Ren, et al.
0

In many real applications such as the data integration, social network analysis, and the Semantic Web, the entity resolution (ER) is an important and fundamental problem, which identifies and links the same real-world entities from various data sources. While prior works usually consider ER over static and complete data, in practice, application data are usually collected in a streaming fashion, and often incur missing attributes (due to the inaccuracy of data extraction techniques). Therefore, in this paper, we will formulate and tackle a novel problem, topic-aware entity resolution over incomplete data streams (TER-iDS), which online imputes incomplete tuples and detects pairs of topic-related matching entities from incomplete data streams. In order to effectively and efficiently tackle the TER-iDS problem, we propose an effective imputation strategy, carefully design effective pruning strategies, as well as indexes/synopsis, and develop an efficient TER-iDS algorithm via index joins. Extensive experiments have been conducted to evaluate the effectiveness and efficiency of our proposed TER-iDS approach over real data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2019

Skyline Queries Over Incomplete Data Streams (Technical Report)

Nowadays, efficient and effective processing over massive stream data ha...
research
08/23/2019

Efficient Join Processing Over Incomplete Data Streams (Technical Report)

For decades, the join operator over fast data streams has always drawn m...
research
08/30/2017

Distributed Holistic Clustering on Linked Data

Link discovery is an active field of research to support data integratio...
research
05/10/2021

Probabilistic Top-k Dominating Queries in Distributed Uncertain Databases (Technical Report)

In many real-world applications such as business planning and sensor dat...
research
05/05/2022

Dangling-Aware Entity Alignment with Mixed High-Order Proximities

We study dangling-aware entity alignment in knowledge graphs (KGs), whic...
research
01/29/2019

Semantic and Influence aware k-Representative Queries over Social Streams

Massive volumes of data continuously generated on social platforms have ...
research
01/28/2022

Boosting Entity Mention Detection for Targetted Twitter Streams with Global Contextual Embeddings

Microblogging sites, like Twitter, have emerged as ubiquitous sources of...

Please sign up or login with your details

Forgot password? Click here to reset