Pre-trained Embeddings for Entity Resolution: An Experimental Analysis [Experiment, Analysis Benchmark]

04/24/2023
by   Alexandros Zeakis, et al.
0

Many recent works on Entity Resolution (ER) leverage Deep Learning techniques involving language models to improve effectiveness. This is applied to both main steps of ER, i.e., blocking and matching. Several pre-trained embeddings have been tested, with the most popular ones being fastText and variants of the BERT model. However, there is no detailed analysis of their pros and cons. To cover this gap, we perform a thorough experimental analysis of 12 popular language models over 17 established benchmark datasets. First, we assess their vectorization overhead for converting all input entities into dense embeddings vectors. Second, we investigate their blocking performance, performing a detailed scalability analysis, and comparing them with the state-of-the-art deep learning-based blocking method. Third, we conclude with their relative performance for both supervised and unsupervised matching. Our experimental results provide novel insights into the strengths and weaknesses of the main language models, facilitating researchers and practitioners to select the most suitable ones in practice.

READ FULL TEXT

page 8

page 11

page 13

page 15

page 18

page 19

page 20

research
01/12/2023

KAER: A Knowledge Augmented Pre-Trained Language Model for Entity Resolution

Entity resolution has been an essential and well-studied task in data cl...
research
07/03/2023

A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based Matching Algorithms

Entity resolution (ER) is the process of identifying records that refer ...
research
01/06/2020

Improving Entity Linking by Modeling Latent Entity Type Information

Existing state of the art neural entity linking models employ attention-...
research
04/01/2020

Deep Entity Matching with Pre-Trained Language Models

We present Ditto, a novel entity matching system based on pre-trained Tr...
research
10/02/2019

Exploiting BERT for End-to-End Aspect-based Sentiment Analysis

In this paper, we investigate the modeling power of contextualized embed...
research
02/25/2022

How to reduce the search space of Entity Resolution: with Blocking or Nearest Neighbor search?

Entity Resolution suffers from quadratic time complexity. To increase it...
research
10/02/2017

DeepER -- Deep Entity Resolution

Entity Resolution (ER) is a fundamental problem with many applications. ...

Please sign up or login with your details

Forgot password? Click here to reset