DaN+: Danish Nested Named Entities and Lexical Normalization

05/24/2021
by   Barbara Plank, et al.
12

This paper introduces DaN+, a new multi-domain corpus and annotation guidelines for Danish nested named entities (NEs) and lexical normalization to support research on cross-lingual cross-domain learning for a less-resourced language. We empirically assess three strategies to model the two-layer Named Entity Recognition (NER) task. We compare transfer capabilities from German versus in-language annotation from scratch. We examine language-specific versus multilingual BERT, and study the effect of lexical normalization on NER. Our results show that 1) the most robust strategy is multi-task learning which is rivaled by multi-label decoding, 2) BERT-based NER models are sensitive to domain shifts, and 3) in-language BERT and lexical normalization are the most beneficial on the least canonical data. Our results also show that an out-of-domain setup remains challenging, while performance on news plateaus quickly. This highlights the importance of cross-domain evaluation of cross-lingual transfer.

READ FULL TEXT
research
09/09/2019

What Matters for Neural Cross-Lingual Named Entity Recognition: An Empirical Analysis

Building named entity recognition (NER) models for languages that do not...
research
06/07/2023

Multilingual Clinical NER: Translation or Cross-lingual Transfer?

Natural language tasks like Named Entity Recognition (NER) in the clinic...
research
07/03/2023

Exploring Spoken Named Entity Recognition: A Cross-Lingual Perspective

Recent advancements in Named Entity Recognition (NER) have significantly...
research
12/17/2019

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

Recent work has exhibited the surprising cross-lingual abilities of mult...
research
10/21/2022

NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities

This paper describes NEREL-BIO – an annotation scheme and corpus of PubM...
research
05/23/2022

RuNNE-2022 Shared Task: Recognizing Nested Named Entities

The RuNNE Shared Task approaches the problem of nested named entity reco...
research
05/13/2021

Cross-Domain Contract Element Extraction with a Bi-directional Feedback Clause-Element Relation Network

Contract element extraction (CEE) is the novel task of automatically ide...

Please sign up or login with your details

Forgot password? Click here to reset