pioNER: Datasets and Baselines for Armenian Named Entity Recognition

10/19/2018
by   Tsolak Ghukasyan, et al.
0

In this work, we tackle the problem of Armenian named entity recognition, providing silver- and gold-standard datasets as well as establishing baseline results on popular models. We present a 163000-token named entity corpus automatically generated and annotated from Wikipedia, and another 53400-token corpus of news sentences with manual annotation of people, organization and location named entities. The corpora were used to train and evaluate several popular named entity recognition models. Alongside the datasets, we release 50-, 100-, 200-, 300-dimensional GloVe word embeddings trained on a collection of Armenian texts from Wikipedia, news, blogs, and encyclopedia.

READ FULL TEXT
research
08/12/2019

A Finnish News Corpus for Named Entity Recognition

We present a corpus of Finnish news articles with a manually prepared na...
research
11/22/2021

Namesakes: Ambiguously Named Entities from Wikipedia and News

We present Namesakes, a dataset of ambiguously named entities obtained f...
research
09/03/2019

Introducing RONEC -- the Romanian Named Entity Corpus

We present RONEC - the Named Entity Corpus for the Romanian language. Th...
research
11/26/2021

KazNERD: Kazakh Named Entity Recognition Dataset

We present the development of a dataset for Kazakh named entity recognit...
research
01/24/2022

Razmecheno: Named Entity Recognition from Digital Archive of Diaries "Prozhito"

The vast majority of existing datasets for Named Entity Recognition (NER...
research
04/08/2020

Entity-Switched Datasets: An Approach to Auditing the In-Domain Robustness of Named Entity Recognition Models

Named entity recognition systems perform well on standard datasets compr...
research
05/04/2020

Code and Named Entity Recognition in StackOverflow

There is an increasing interest in studying natural language and compute...

Please sign up or login with your details

Forgot password? Click here to reset