ANETAC: Arabic Named Entity Transliteration and Classification Dataset

In this paper, we make freely accessible ANETAC our English-Arabic named entity transliteration and classification dataset that we built from freely available parallel translation corpora. The dataset contains 79,924 instances, each instance is a triplet (e, a, c), where e is the English named entity, a is its Arabic transliteration and c is its class that can be either a Person, a Location, or an Organization. The ANETAC dataset is mainly aimed for the researchers that are working on Arabic named entity transliteration, but it can also be used for named entity classification purposes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2018

Design Challenges in Named Entity Transliteration

We analyze some of the fundamental design challenges that impact the dev...
research
04/06/2023

Using LSTM and GRU With a New Dataset for Named Entity Recognition in the Arabic Language

Named entity recognition (NER) is a natural language processing task (NL...
research
11/26/2021

KazNERD: Kazakh Named Entity Recognition Dataset

We present the development of a dataset for Kazakh named entity recognit...
research
05/19/2022

Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

This paper presents Wojood, a corpus for Arabic nested Named Entity Reco...
research
05/12/2022

Comparing Open Arabic Named Entity Recognition Tools

The main objective of this paper is to compare and evaluate the performa...
research
03/15/2016

Evaluating the word-expert approach for Named-Entity Disambiguation

Named Entity Disambiguation (NED) is the task of linking a named-entity ...

Please sign up or login with your details

Forgot password? Click here to reset