(Almost) All of Entity Resolution

08/10/2020
by   Olivier Binette, et al.
0

Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme - integrating information from multiple sources. Before such questions can be answered, databases must be cleaned and integrated in a systematic and accurate way, commonly known as record linkage, de-duplication, or entity resolution. In this article, we review motivational applications and seminal papers that have led to the growth of this area. Specifically, we review the foundational work that began in the 1940's and 50's that have led to modern probabilistic record linkage. We review clustering approaches to entity resolution, semi- and fully supervised methods, and canonicalization, which are being used throughout industry and academia in applications such as human rights, official statistics, medicine, citation networks, among others. Finally, we discuss current research topics of practical importance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2015

A Practioner's Guide to Evaluating Entity Resolution Results

Entity resolution (ER) is the task of identifying records belonging to t...
research
12/27/2017

Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases

Accurate and efficient entity resolution is an open challenge of particu...
research
10/07/2017

Unique Entity Estimation with Application to the Syrian Conflict

Entity resolution identifies and removes duplicate entities in large, no...
research
05/30/2018

Anaphora and Coreference Resolution: A Review

Entity resolution aims at resolving repeated references to an entity in ...
research
12/12/2021

Graph-based hierarchical record clustering for unsupervised entity resolution

Here we study the problem of matched record clustering in unsupervised e...
research
10/11/2018

Probabilistic Blocking with An Application to the Syrian Conflict

Entity resolution seeks to merge databases as to remove duplicate entrie...
research
02/16/2021

VIEW: a framework for organization level interactive record linkage to support reproducible data science

Objective: To design and evaluate a general framework for interactive re...

Please sign up or login with your details

Forgot password? Click here to reset