Query-time Entity Resolution

10/31/2011
by   I. Bhattacharya, et al.
0

Entity resolution is the problem of reconciling database references corresponding to the same real-world entities. Given the abundance of publicly available databases that have unresolved entities, we motivate the problem of query-time entity resolution quick and accurate resolution for answering queries over such unclean databases at query-time. Since collective entity resolution approaches --- where related references are resolved jointly --- have been shown to be more accurate than independent attribute-based resolution for off-line entity resolution, we focus on developing new algorithms for collective resolution for answering entity resolution queries at query-time. For this purpose, we first formally show that, for collective resolution, precision and recall for individual entities follow a geometric progression as neighbors at increasing distances are considered. Unfolding this progression leads naturally to a two stage expand and resolve query processing strategy. In this strategy, we first extract the related records for a query using two novel expansion operators, and then resolve the extracted records collectively. We then show how the same strategy can be adapted for query-time entity resolution by identifying and resolving only those database references that are the most helpful for processing the query. We validate our approach on two large real-world publication databases where we show the usefulness of collective resolution and at the same time demonstrate the need for adaptive strategies for query processing. We then show how the same queries can be answered in real-time using our adaptive approach while preserving the gains of collective resolution. In addition to experiments on real datasets, we use synthetically generated data to empirically demonstrate the validity of the performance trends predicted by our analysis of collective entity resolution over a wide range of structural characteristics in the data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2020

Crowdsourced Collective Entity Resolution with Relational Match Propagation

Knowledge bases (KBs) store rich yet heterogeneous entities and facts. E...
research
05/26/2023

Combining Global and Local Merges in Logic-based Entity Resolution

In the recently proposed Lace framework for collective entity resolution...
research
03/13/2023

A Framework for Combining Entity Resolution and Query Answering in Knowledge Bases

We propose a new framework for combining entity resolution and query ans...
research
02/03/2022

QueryER: A Framework for Fast Analysis-Aware Deduplication over Dirty Data

In this work, we explore the problem of correctly and efficiently answer...
research
08/24/2020

On sampling from data with duplicate records

Data deduplication is the task of detecting records in a database that c...
research
03/11/2018

Entity Resolution and Federated Learning get a Federated Resolution

Consider two data providers, each maintaining records of different featu...
research
09/29/2017

Entity Consolidation: The Golden Record Problem

Four key processes in data integration are: data preparation (i.e., extr...

Please sign up or login with your details

Forgot password? Click here to reset