Finetuning BERT on Partially Annotated NER Corpora

11/25/2022
by   Viktor Scherbakov, et al.
0

Most Named Entity Recognition (NER) models operate under the assumption that training datasets are fully labelled. While it is valid for established datasets like CoNLL 2003 and OntoNotes, sometimes it is not feasible to obtain the complete dataset annotation. These situations may occur, for instance, after selective annotation of entities for cost reduction. This work presents an approach to finetuning BERT on such partially labelled datasets using self-supervision and label preprocessing. Our approach outperforms the previous LSTM-based label preprocessing baseline, significantly improving the performance on poorly labelled datasets. We demonstrate that following our approach while finetuning RoBERTa on CoNLL 2003 dataset with only 10 entities labelled is enough to reach the performance of the baseline trained on the same dataset with 50

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2022

Transformer-Based Named Entity Recognition for French Using Adversarial Adaptation to Similar Domain Corpora

Named Entity Recognition (NER) involves the identification and classific...
research
04/19/2022

Named Entity Recognition for Partially Annotated Datasets

The most common Named Entity Recognizers are usually sequence taggers tr...
research
04/30/2020

Named Entity Recognition without Labelled Data: A Weak Supervision Approach

Named Entity Recognition (NER) performance often degrades rapidly when a...
research
01/21/2021

Validating Label Consistency in NER Data Annotation

Data annotation plays a crucial role in ensuring your named entity recog...
research
05/01/2020

Partially-Typed NER Datasets Integration: Connecting Practice to Theory

While typical named entity recognition (NER) models require the training...

Please sign up or login with your details

Forgot password? Click here to reset