Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization

01/22/2020
by   Bruno Taillé, et al.
0

Contextualized embeddings use unsupervised language model pretraining to compute word representations depending on their context. This is intuitively useful for generalization, especially in Named-Entity Recognition where it is crucial to detect mentions never seen during training. However, standard English benchmarks overestimate the importance of lexical over contextual features because of an unrealistic lexical overlap between train and test mentions. In this paper, we perform an empirical analysis of the generalization capabilities of state-of-the-art contextualized embeddings by separating mentions by novelty and with out-of-domain evaluation. We show that they are particularly beneficial for unseen mentions detection, especially out-of-domain. For models trained on CoNLL03, language model contextualization leads to a +1.2 +13

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2017

Morphological Embeddings for Named Entity Recognition in Morphologically Rich Languages

In this work, we present new state-of-the-art results of 93.59, for Turk...
research
09/14/2023

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

In spite of the excellent strides made by end-to-end (E2E) models in spe...
research
06/15/2022

Contextualization and Generalization in Entity and Relation Extraction

During the past decade, neural networks have become prominent in Natural...
research
12/01/2021

Building astroBERT, a language model for Astronomy Astrophysics

The existing search tools for exploring the NASA Astrophysics Data Syste...
research
10/31/2016

Named Entity Recognition for Novel Types by Transfer Learning

In named entity recognition, we often don't have a large in-domain train...
research
06/09/2018

Robust Lexical Features for Improved Neural Network Named-Entity Recognition

Neural network approaches to Named-Entity Recognition reduce the need fo...
research
12/21/2022

Uncontrolled Lexical Exposure Leads to Overestimation of Compositional Generalization in Pretrained Models

Human linguistic capacity is often characterized by compositionality and...

Please sign up or login with your details

Forgot password? Click here to reset