CrossNER: Evaluating Cross-Domain Named Entity Recognition

12/08/2020
by   Zihan Liu, et al.
7

Cross-domain named entity recognition (NER) models are able to cope with the scarcity issue of NER samples in target domains. However, most of the existing NER benchmarks lack domain-specialized entity types or do not focus on a certain domain, leading to a less effective cross-domain evaluation. To address these obstacles, we introduce a cross-domain NER dataset (CrossNER), a fully-labeled collection of NER data spanning over five diverse domains with specialized entity categories for different domains. Additionally, we also provide a domain-related corpus since using it to continue pre-training language models (domain-adaptive pre-training) is effective for the domain adaptation. We then conduct comprehensive experiments to explore the effectiveness of leveraging different levels of the domain corpus and pre-training strategies to do domain-adaptive pre-training for the cross-domain task. Results show that focusing on the fractional corpus containing domain-specialized entities and utilizing a more challenging pre-training strategy in domain-adaptive pre-training are beneficial for the NER domain adaptation, and our proposed method can consistently outperform existing cross-domain NER baselines. Nevertheless, experiments also illustrate the challenge of this cross-domain NER task. We hope that our dataset and baselines will catalyze research in the NER domain adaptation area. The code and data are available at https://github.com/zliucr/CrossNER.

READ FULL TEXT
02/14/2020

Zero-Resource Cross-Domain Named Entity Recognition

Existing models for cross-domain named entity recognition (NER) rely on ...
07/02/2021

Data Centric Domain Adaptation for Historical Text with OCR Errors

We propose new methods for in-domain and cross-domain Named Entity Recog...
04/24/2020

Coach: A Coarse-to-Fine Approach for Cross-domain Slot Filling

As an essential task in task-oriented dialog systems, slot filling requi...
06/07/2022

Searching for Optimal Subword Tokenization in Cross-domain NER

Input distribution shift is one of the vital problems in unsupervised do...
05/13/2021

Cross-Domain Contract Element Extraction with a Bi-directional Feedback Clause-Element Relation Network

Contract element extraction (CEE) is the novel task of automatically ide...
03/22/2022

A Broad Study of Pre-training for Domain Generalization and Adaptation

Deep models must learn robust and transferable representations in order ...
05/24/2021

DaN+: Danish Nested Named Entities and Lexical Normalization

This paper introduces DaN+, a new multi-domain corpus and annotation gui...

Code Repositories

CrossNER

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)


view repo

mask_keyword_pretrain_crossner

None


view repo

DRUG_CROSSNER

Adaptation of CrossNER for drug NER in Darknet markets


view repo