Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping

07/08/2017
by   Jian Ni, et al.
0

The state-of-the-art named entity recognition (NER) systems are statistical machine learning models that have strong generalization capability (i.e., can recognize unseen entities that do not appear in training data) based on lexical and contextual information. However, such a model could still make mistakes if its features favor a wrong entity type. In this paper, we utilize Wikipedia as an open knowledge base to improve multilingual NER systems. Central to our approach is the construction of high-accuracy, high-coverage multilingual Wikipedia entity type mappings. These mappings are built from weakly annotated data and can be extended to new languages with no human annotation or language-dependent knowledge involved. Based on these mappings, we develop several approaches to improve an NER system. We evaluate the performance of the approaches via experiments on NER systems trained for 6 languages. Experimental results show that the proposed approaches are effective in improving the accuracy of such systems on unseen entities, especially when a system is applied to a new domain or it is trained with little training data (up to 18.3 F1 score improvement).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2017

KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition

KnowNER is a multilingual Named Entity Recognition (NER) system that lev...
research
01/19/2021

Single versus Multiple Annotation for Named Entity Recognition of Mutations

The focus of this paper is to address the knowledge acquisition bottlene...
research
05/04/2023

USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual Adaptation of Gazetteer for Multilingual Complex NER

This paper describes the system developed by the USTC-NELSLIP team for S...
research
04/27/2021

Named Entity Recognition and Linking Augmented with Large-Scale Structured Data

In this paper we describe our submissions to the 2nd and 3rd SlavNER Sha...
research
06/22/2022

Evaluation of Embedding Models for Automatic Extraction and Classification of Acknowledged Entities in Scientific Documents

Acknowledgments in scientific papers may give an insight into aspects of...
research
07/25/2023

Embedding Models for Supervised Automatic Extraction and Classification of Named Entities in Scientific Acknowledgements

Acknowledgments in scientific papers may give an insight into aspects of...

Please sign up or login with your details

Forgot password? Click here to reset