Robust Multilingual Named Entity Recognition with Shallow Semi-Supervised Features

01/31/2017
by   Rodrigo Agerri, et al.
0

We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empirical experimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2019

Named Entity Recognition for Nepali Language

Named Entity Recognition have been studied for different languages like ...
research
09/11/2017

KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition

KnowNER is a multilingual Named Entity Recognition (NER) system that lev...
research
11/19/2019

Towards Lingua Franca Named Entity Recognition with BERT

Information extraction is an important task in NLP, enabling the automat...
research
10/20/2018

Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings

Recently, due to the increasing popularity of social media, the necessit...
research
12/01/2016

Domain Adaptation for Named Entity Recognition in Online Media with Word Embeddings

Content on the Internet is heterogeneous and arises from various domains...
research
11/09/2022

An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better?

Recent research in clustering face embeddings has found that unsupervise...
research
08/26/2021

Rethinking Negative Sampling for Unlabeled Entity Problem in Named Entity Recognition

In many situations (e.g., distant supervision), unlabeled entity problem...

Please sign up or login with your details

Forgot password? Click here to reset