Building and Evaluating Universal Named-Entity Recognition English corpus

12/14/2022
by   Diego Alves, et al.
0

This article presents the application of the Universal Named Entity framework to generate automatically annotated corpora. By using a workflow that extracts Wikipedia data and meta-data and DBpedia information, we generated an English dataset which is described and evaluated. Furthermore, we conducted a set of experiments to improve the annotations in terms of precision, recall, and F1-measure. The final dataset is available and the established workflow can be applied to any language with existing Wikipedia and DBpedia. As part of future research, we intend to continue improving the annotation process and extend it to other languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

UNER: Universal Named-Entity RecognitionFramework

We introduce the Universal Named-Entity Recognition (UNER)framework, a 4...
research
12/14/2022

Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia

With the ever-growing popularity of the field of NLP, the demand for dat...
research
02/19/2023

Exploring the Potential of Machine Translation for Generating Named Entity Datasets: A Case Study between Persian and English

This study focuses on the generation of Persian named entity datasets th...
research
02/26/2020

Detecting Potential Topics In News Using BERT, CRF and Wikipedia

For a news content distribution platform like Dailyhunt, Named Entity Re...
research
05/22/2023

Aligning the Norwegian UD Treebank with Entity and Coreference Information

This paper presents a merged collection of entity and coreference annota...
research
11/11/2020

Overview of CAPITEL Shared Tasks at IberLEF 2020: Named Entity Recognition and Universal Dependencies Parsing

We present the results of the CAPITEL-EVAL shared task, held in the cont...
research
10/04/2017

Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl

We present DepCC, the largest to date linguistically analyzed corpus in ...

Please sign up or login with your details

Forgot password? Click here to reset