Focusing on Possible Named Entities in Active Named Entity Label Acquisition

11/06/2021
by   Ali Osman Berk Sapci, et al.
8

Named entity recognition (NER) aims to identify mentions of named entities in an unstructured text and classify them into the predefined named entity classes. Even though deep learning-based pre-trained language models achieve good predictive performances, many domain-specific NERtasks still require a sufficient amount of labeled data. Active learning (AL), a general framework for the label acquisition problem, has been used for the NER tasks to minimize the annotation cost without sacrificing model performance. However, heavily imbalanced class distribution of tokens introduces challenges in designing effective AL querying methods for NER. We propose AL sentence query evaluation functions which pay more attention to possible positive tokens, and evaluate these proposed functions with both sentence-based and token-based cost evaluation strategies. We also propose a better data-driven normalization approach to penalize too long or too short sentences. Our experiments on three datasets from different domains reveal that the proposed approaches reduce the number of annotated tokens while achieving better or comparable prediction performance with conventional methods.

READ FULL TEXT

page 3

page 4

page 6

page 8

page 9

page 13

page 16

page 17

research
09/11/2021

AdaK-NER: An Adaptive Top-K Approach for Named Entity Recognition with Incomplete Annotations

State-of-the-art Named Entity Recognition(NER) models rely heavily on la...
research
05/29/2023

E-NER: Evidential Deep Learning for Trustworthy Named Entity Recognition

Most named entity recognition (NER) systems focus on improving model per...
research
09/13/2018

On the Strength of Character Language Models for Multilingual Named Entity Recognition

Character-level patterns have been widely used as features in English Na...
research
12/13/2021

ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts

Named entity recognition (NER) is an important task that aims to resolve...
research
11/08/2022

Active Learning with Tabular Language Models

Despite recent advancements in tabular language model research, real-wor...
research
02/02/2023

Predefined domain specific embeddings of food concepts and recipes: A case study on heterogeneous recipe datasets

Although recipe data are very easy to come by nowadays, it is really har...
research
06/30/2023

Information Extraction in Domain and Generic Documents: Findings from Heuristic-based and Data-driven Approaches

Information extraction (IE) plays very important role in natural languag...

Please sign up or login with your details

Forgot password? Click here to reset