A Theoretical Analysis of First Heuristics of Crowdsourced Entity Resolution

02/03/2017
by   Arya Mazumdar, et al.
0

Entity resolution (ER) is the task of identifying all records in a database that refer to the same underlying entity, and are therefore duplicates of each other. Due to inherent ambiguity of data representation and poor data quality, ER is a challenging task for any automated process. As a remedy, human-powered ER via crowdsourcing has become popular in recent years. Using crowd to answer queries is costly and time consuming. Furthermore, crowd-answers can often be faulty. Therefore, crowd-based ER methods aim to minimize human participation without sacrificing the quality and use a computer generated similarity matrix actively. While, some of these methods perform well in practice, no theoretical analysis exists for them, and further their worst case performances do not reflect the experimental findings. This creates a disparity in the understanding of the popular heuristics for this problem. In this paper, we make the first attempt to close this gap. We provide a thorough analysis of the prominent heuristic algorithms for crowd-based ER. We justify experimental observations with our analysis and information theoretic lower bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2017

T-Crowd: Effective Crowdsourcing for Tabular Data

Crowdsourcing employs human workers to solve computer-hard problems, suc...
research
06/22/2017

Clustering with Noisy Queries

In this paper, we initiate a rigorous theoretical study of clustering wi...
research
09/20/2021

Crowdsourcing Diverse Paraphrases for Training Task-oriented Bots

A prominent approach to build datasets for training task-oriented bots i...
research
01/19/2023

Reversing The Twenty Questions Game

Twenty questions is a widely popular verbal game. In recent years, many ...
research
03/31/2015

Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons

We introduce an unsupervised approach to efficiently discover the underl...
research
04/12/2017

Real-time On-Demand Crowd-powered Entity Extraction

Output-agreement mechanisms such as ESP Game have been widely used in hu...

Please sign up or login with your details

Forgot password? Click here to reset