Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction

05/12/2022
by   Tianshu Wang, et al.
0

Entity matching (EM) is the most critical step for entity resolution (ER). While current deep learningbased methods achieve very impressive performance on standard EM benchmarks, their realworld application performance is much frustrating. In this paper, we highlight that such the gap between reality and ideality stems from the unreasonable benchmark construction process, which is inconsistent with the nature of entity matching and therefore leads to biased evaluations of current EM approaches. To this end, we build a new EM corpus and re-construct EM benchmarks to challenge critical assumptions implicit in the previous benchmark construction process by step-wisely changing the restricted entities, balanced labels, and single-modal records in previous benchmarks into open entities, imbalanced labels, and multimodal records in an open environment. Experimental results demonstrate that the assumptions made in the previous benchmark construction process are not coincidental with the open environment, which conceal the main challenges of the task and therefore significantly overestimate the current progress of entity matching. The constructed benchmarks and code are publicly released

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2023

MultiEM: Efficient and Effective Unsupervised Multi-Table Entity Matching

Entity Matching (EM), which aims to identify all entity pairs referring ...
research
07/11/2022

PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching

Entity Matching (EM), which aims to identify whether two entity records ...
research
06/15/2021

Machamp: A Generalized Entity Matching Benchmark

Entity Matching (EM) refers to the problem of determining whether two di...
research
07/03/2023

A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based Matching Algorithms

Entity resolution (ER) is the process of identifying records that refer ...
research
07/22/2021

Frost: Benchmarking and Exploring Data Matching Results

"Bad" data has a direct impact on 88 losing 12 representations of the sa...
research
06/10/2022

Machop: an End-to-End Generalized Entity Matching Framework

Real-world applications frequently seek to solve a general form of the E...
research
01/23/2023

WDC Products: A Multi-Dimensional Entity Matching Benchmark

The difficulty of an entity matching task depends on a combination of mu...

Please sign up or login with your details

Forgot password? Click here to reset