AI Chat AI Image Generator AI Video Text to Speech

Finetuning BERT on Partially Annotated NER Corpora

11/25/2022

∙

by Viktor Scherbakov, et al.

∙

∙

Most Named Entity Recognition (NER) models operate under the assumption that training datasets are fully labelled. While it is valid for established datasets like CoNLL 2003 and OntoNotes, sometimes it is not feasible to obtain the complete dataset annotation. These situations may occur, for instance, after selective annotation of entities for cost reduction. This work presents an approach to finetuning BERT on such partially labelled datasets using self-supervision and label preprocessing. Our approach outperforms the previous LSTM-based label preprocessing baseline, significantly improving the performance on poorly labelled datasets. We demonstrate that following our approach while finetuning RoBERTa on CoNLL 2003 dataset with only 10 entities labelled is enough to reach the performance of the baseline trained on the same dataset with 50

Viktor Scherbakov
1 publication
Vladimir Mayorov
1 publication

page 1

page 2

page 3

page 4

research

∙ 12/05/2022

Transformer-Based Named Entity Recognition for French Using Adversarial Adaptation to Similar Domain Corpora

Named Entity Recognition (NER) involves the identification and classific...

0 Arjun Choudhry, et al. ∙

research

∙ 04/19/2022

Named Entity Recognition for Partially Annotated Datasets

The most common Named Entity Recognizers are usually sequence taggers tr...

5 Michael Strobl, et al. ∙

research

∙ 04/30/2020

Named Entity Recognition without Labelled Data: A Weak Supervision Approach

Named Entity Recognition (NER) performance often degrades rapidly when a...

16 Pierre Lison, et al. ∙

research

∙ 09/14/2020

Development of a Dataset and a Deep Learning Baseline Named Entity Recognizer for Three Low Resource Languages: Bhojpuri, Maithili and Magahi

In Natural Language Processing (NLP) pipelines, Named Entity Recognition...

0 Rajesh Kumar Mundotiya, et al. ∙

research

∙ 01/21/2021

Validating Label Consistency in NER Data Annotation

Data annotation plays a crucial role in ensuring your named entity recog...

0 Qingkai Zeng, et al. ∙

research

∙ 05/01/2020

Partially-Typed NER Datasets Integration: Connecting Practice to Theory

While typical named entity recognition (NER) models require the training...

5 Shi Zhi, et al. ∙