Comparing Open Arabic Named Entity Recognition Tools

05/12/2022
by   Abdullah Aldumaykhi, et al.
0

The main objective of this paper is to compare and evaluate the performances of three open Arabic NER tools: CAMeL, Hatmi, and Stanza. We collected a corpus consisting of 30 articles written in MSA and manually annotated all the entities of the person, organization, and location types at the article (document) level. Our results suggest a similarity between Stanza and Hatmi with the latter receiving the highest F1 score for the three entity types. However, CAMeL achieved the highest precision values for names of people and organizations. Following this, we implemented a "merge" method that combined the results from the three tools and a "vote" method that tagged named entities only when two of the three identified them as entities. Our results showed that merging achieved the highest overall F1 scores. Moreover, merging had the highest recall values while voting had the highest precision values for the three entity types. This indicates that merging is more suitable when recall is desired, while voting is optimal when precision is required. Finally, we collected a corpus of 21,635 articles related to COVID-19 and applied the merge and vote methods. Our analysis demonstrates the tradeoff between precision and recall for the two methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2022

Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

This paper presents Wojood, a corpus for Arabic nested Named Entity Reco...
research
08/28/2023

ANER: Arabic and Arabizi Named Entity Recognition using Transformer-Based Approach

One of the main tasks of Natural Language Processing (NLP), is Named Ent...
research
07/27/2018

Clustering Prominent People and Organizations in Topic-Specific Text Corpora

Named entities in text documents are the names of people, organization, ...
research
07/06/2019

ANETAC: Arabic Named Entity Transliteration and Classification Dataset

In this paper, we make freely accessible ANETAC our English-Arabic named...
research
09/12/2015

Kannada named entity recognition and classification (nerc) based on multinomial naïve bayes (mnb) classifier

Named Entity Recognition and Classification (NERC) is a process of ident...
research
11/20/2019

Multi-Source Spatial Entity Linkage

Besides the traditional cartographic data sources, spatial information c...
research
09/05/2018

Merging datasets through deep learning

Merging datasets is a key operation for data analytics. A frequent requi...

Please sign up or login with your details

Forgot password? Click here to reset