ParaNames: A Massively Multilingual Entity Name Corpus

02/28/2022
by   Jonne Sälevä, et al.
0

This preprint describes work in progress on ParaNames, a multilingual parallel name resource consisting of names for approximately 14 million entities. The included names span over 400 languages, and almost all entities are mapped to standardized entity types (PER/LOC/ORG). Using Wikidata as a source, we create the largest resource of this type to-date. We describe our approach to filtering and standardizing the data to provide the best quality possible. ParaNames is useful for multilingual language processing, both in defining tasks for name translation/transliteration and as supplementary data for tasks such as named entity recognition and linking. Our resource is released on GitHub (https://github.com/bltlab/paranames) under a Creative Commons license (CC BY 4.0).

READ FULL TEXT
research
06/15/2023

Multilingual End to End Entity Linking

Entity Linking is one of the most common Natural Language Processing tas...
research
05/04/2020

Soft Gazetteers for Low-Resource Named Entity Recognition

Traditional named entity recognition models use gazetteers (lists of ent...
research
01/28/2022

Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages

Parallel corpora are ideal for extracting a multilingual named entity (M...
research
04/01/2021

Mining Wikidata for Name Resources for African Languages

This work supports further development of language technology for the la...
research
06/27/2022

Endowing Language Models with Multimodal Knowledge Graph Representations

We propose a method to make natural language understanding models more p...
research
04/27/2021

Named Entity Recognition and Linking Augmented with Large-Scale Structured Data

In this paper we describe our submissions to the 2nd and 3rd SlavNER Sha...
research
10/24/2020

Disease Normalization with Graph Embeddings

The detection and normalization of diseases in biomedical texts are key ...

Please sign up or login with your details

Forgot password? Click here to reset