Entity-Switched Datasets: An Approach to Auditing the In-Domain Robustness of Named Entity Recognition Models

04/08/2020
by   Oshin Agarwal, et al.
0

Named entity recognition systems perform well on standard datasets comprising English news. But given the paucity of data, it is difficult to draw conclusions about the robustness of systems with respect to recognizing a diverse set of entities. We propose a method for auditing the in-domain robustness of systems, focusing specifically on differences in performance due to the national origin of entities. We create entity-switched datasets, in which named entities in the original texts are replaced by plausible named entities of the same type but of different national origin. We find that state-of-the-art systems' performance vary widely even in-domain: In the same context, entities from certain origins are more reliably recognized than entities from elsewhere. Systems perform best on American and Indian entities, and worst on Vietnamese and Indonesian entities. This auditing approach can facilitate the development of more robust named entity recognition systems, and will allow research in this area to consider fairness criteria that have received heightened attention in other predictive technology work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2018

pioNER: Datasets and Baselines for Armenian Named Entity Recognition

In this work, we tackle the problem of Armenian named entity recognition...
research
05/18/2020

Improving Named Entity Recognition in Tor Darknet with Local Distance Neighbor Feature

Name entity recognition in noisy user-generated texts is a difficult tas...
research
05/24/2020

MASK: A flexible framework to facilitate de-identification of clinical texts

Medical health records and clinical summaries contain a vast amount of i...
research
04/01/2019

Recognizing Musical Entities in User-generated Content

Recognizing Musical Entities is important for Music Information Retrieva...
research
10/26/2018

Named Person Coreference in English News

People are often entities of interest in tasks such as search and inform...
research
05/14/2016

Occurrence Statistics of Entities, Relations and Types on the Web

The problem of collecting reliable estimates of occurrence of entities o...
research
04/15/2021

Regularizing Models via Pointwise Mutual Information for Named Entity Recognition

In Named Entity Recognition (NER), pre-trained language models have been...

Please sign up or login with your details

Forgot password? Click here to reset