Robustness Evaluation of Entity Disambiguation Using Prior Probes:the Case of Entity Overshadowing

08/24/2021
by   Vera Provatorova, et al.
10

Entity disambiguation (ED) is the last step of entity linking (EL), when candidate entities are reranked according to the context they appear in. All datasets for training and evaluating models for EL consist of convenience samples, such as news articles and tweets, that propagate the prior probability bias of the entity distribution towards more frequently occurring entities. It was previously shown that the performance of the EL systems on such datasets is overestimated since it is possible to obtain higher accuracy scores by merely learning the prior. To provide a more adequate evaluation benchmark, we introduce the ShadowLink dataset, which includes 16K short text snippets annotated with entity mentions. We evaluate and report the performance of popular EL systems on the ShadowLink benchmark. The results show a considerable difference in accuracy between more and less common entities for all of the EL systems under evaluation, demonstrating the effects of prior probability bias and entity overshadowing.

READ FULL TEXT
research
04/10/2017

Entity Linking for Queries by Searching Wikipedia Sentences

We present a simple yet effective approach for linking entities in queri...
research
01/14/2021

On the Temporality of Priors in Entity Linking

Entity linking is a fundamental task in natural language processing whic...
research
01/07/2021

Read, Retrospect, Select: An MRC Framework to Short Text Entity Linking

Entity linking (EL) for the rapidly growing short text (e.g. search quer...
research
09/17/2019

Network entity characterization and attack prediction

The devastating effects of cyber-attacks, highlight the need for novel a...
research
06/12/2021

Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP

Retrieval is a core component for open-domain NLP tasks. In open-domain ...
research
05/11/2021

Conversational Entity Linking: Problem Definition and Datasets

Machine understanding of user utterances in conversational systems is of...
research
10/26/2015

How to merge three different methods for information filtering ?

Twitter is now a gold marketing tool for entities concerned with online ...

Please sign up or login with your details

Forgot password? Click here to reset