Partially-Typed NER Datasets Integration: Connecting Practice to Theory

05/01/2020
by   Shi Zhi, et al.
5

While typical named entity recognition (NER) models require the training set to be annotated with all target types, each available datasets may only cover a part of them. Instead of relying on fully-typed NER datasets, many efforts have been made to leverage multiple partially-typed ones for training and allow the resulting model to cover a full type set. However, there is neither guarantee on the quality of integrated datasets, nor guidance on the design of training algorithms. Here, we conduct a systematic analysis and comparison between partially-typed NER datasets and fully-typed ones, in both theoretical and empirical manner. Firstly, we derive a bound to establish that models trained with partially-typed annotations can reach a similar performance with the ones trained with fully-typed annotations, which also provides guidance on the algorithm design. Moreover, we conduct controlled experiments, which shows partially-typed datasets leads to similar performance with the model trained with the same amount of fully-typed annotations

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2022

Rethinking the Value of Gazetteer in Chinese Named Entity Recognition

Gazetteer is widely used in Chinese named entity recognition (NER) to en...
research
09/20/2019

Named Entity Recognition with Partially Annotated Training Data

Supervised machine learning assumes the availability of fully-labeled da...
research
09/25/2019

Learning A Unified Named Entity Tagger From Multiple Partially Annotated Corpora For Efficient Adaptation

Named entity recognition (NER) identifies typed entity mentions in raw t...
research
04/19/2022

Named Entity Recognition for Partially Annotated Datasets

The most common Named Entity Recognizers are usually sequence taggers tr...
research
11/25/2022

Finetuning BERT on Partially Annotated NER Corpora

Most Named Entity Recognition (NER) models operate under the assumption ...
research
05/02/2020

Sources of Transfer in Multilingual Named Entity Recognition

Named-entities are inherently multilingual, and annotations in any given...

Please sign up or login with your details

Forgot password? Click here to reset