Soft Gazetteers for Low-Resource Named Entity Recognition

05/04/2020
by   Shruti Rijhwani, et al.
0

Traditional named entity recognition models use gazetteers (lists of entities) as features to improve performance. Although modern neural network models do not require such hand-crafted features for strong performance, recent work has demonstrated their utility for named entity recognition on English data. However, designing such features for low-resource languages is challenging, because exhaustive entity gazetteers do not exist in these languages. To address this problem, we propose a method of "soft gazetteers" that incorporates ubiquitously available information from English knowledge bases, such as Wikipedia, into neural named entity recognition models through cross-lingual entity linking. Our experiments on four low-resource languages show an average improvement of 4 points in F1 score. Code and data are available at https://github.com/neulab/soft-gazetteers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2023

Exploring the Potential of Machine Translation for Generating Named Entity Datasets: A Case Study between Persian and English

This study focuses on the generation of Persian named entity datasets th...
research
08/16/2019

Named Entity Recognition for Nepali Language

Named Entity Recognition have been studied for different languages like ...
research
11/03/2020

Exhaustive Entity Recognition for Coptic: Challenges and Solutions

Entity recognition provides semantic access to ancient materials in the ...
research
06/09/2018

Robust Lexical Features for Improved Neural Network Named-Entity Recognition

Neural network approaches to Named-Entity Recognition reduce the need fo...
research
03/06/2020

Improving Neural Named Entity Recognition with Gazetteers

The goal of this work is to improve the performance of a neural named en...
research
02/25/2021

ANEA: Distant Supervision for Low-Resource Named Entity Recognition

Distant supervision allows obtaining labeled training corpora for low-re...
research
02/28/2022

ParaNames: A Massively Multilingual Entity Name Corpus

This preprint describes work in progress on ParaNames, a multilingual pa...

Please sign up or login with your details

Forgot password? Click here to reset