KazNERD: Kazakh Named Entity Recognition Dataset

11/26/2021
by   Rustem Yeshpanov, et al.
0

We present the development of a dataset for Kazakh named entity recognition. The dataset was built as there is a clear need for publicly available annotated corpora in Kazakh, as well as annotation guidelines containing straightforward–but rigorous–rules and examples. The dataset annotation, based on the IOB2 scheme, was carried out on television news text by two native Kazakh speakers under the supervision of the first author. The resulting dataset contains 112,702 sentences and 136,333 annotations for 25 entity classes. State-of-the-art machine learning models to automatise Kazakh named entity recognition were also built, with the best-performing model achieving an exact match F1-score of 97.22 guidelines, and codes used to train the models are freely available for download under the CC BY 4.0 licence from https://github.com/IS2AI/KazNERD.

READ FULL TEXT
research
10/19/2018

pioNER: Datasets and Baselines for Armenian Named Entity Recognition

In this work, we tackle the problem of Armenian named entity recognition...
research
12/30/2021

KIND: an Italian Multi-Domain Dataset for Named Entity Recognition

In this paper we present KIND, an Italian dataset for Named-Entity Recog...
research
11/02/2022

Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop

Telephone transcription data can be very noisy due to speech recognition...
research
07/29/2021

Addressing Barriers to Reproducible Named Entity Recognition Evaluation

To address what we believe is a looming crisis of unreproducible evaluat...
research
01/03/2020

Information Extraction based on Named Entity for Tourism Corpus

Tourism information is scattered around nowadays. To search for the info...
research
08/04/2021

With One Voice: Composing a Travel Voice Assistant from Re-purposed Models

Voice assistants provide users a new way of interacting with digital pro...
research
07/06/2019

ANETAC: Arabic Named Entity Transliteration and Classification Dataset

In this paper, we make freely accessible ANETAC our English-Arabic named...

Please sign up or login with your details

Forgot password? Click here to reset