Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

05/19/2022
by   Mustafa Jarrar, et al.
0

This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER). Nested entities occur when one entity mention is embedded inside another entity mention. Wojood consists of about 550K Modern Standard Arabic (MSA) and dialect tokens that are manually annotated with 21 entity types including person, organization, location, event and date. More importantly, the corpus is annotated with nested entities instead of the more common flat annotations. The data contains about 75K entities and 22.5 which are nested. The inter-annotator evaluation of the corpus demonstrated a strong agreement with Cohen's Kappa of 0.979 and an F1-score of 0.976. To validate our data, we used the corpus to train a nested NER model based on multi-task learning and AraBERT (Arabic BERT). The model achieved an overall micro F1-score of 0.884. Our corpus, the annotation guidelines, the source code and the pre-trained model are publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2023

ANER: Arabic and Arabizi Named Entity Recognition using Transformer-Based Approach

One of the main tasks of Natural Language Processing (NLP), is Named Ent...
research
05/12/2022

Comparing Open Arabic Named Entity Recognition Tools

The main objective of this paper is to compare and evaluate the performa...
research
11/27/2019

NorNE: Annotating Named Entities for Norwegian

This paper presents NorNE, a manually annotated corpus of named entities...
research
09/07/2020

Uncovering the Corona Virus Map Using Deep Entities and Relationship Models

We extract entities and relationships related to COVID-19 from a corpus ...
research
05/23/2022

RuNNE-2022 Shared Task: Recognizing Nested Named Entities

The RuNNE Shared Task approaches the problem of nested named entity reco...
research
07/06/2019

ANETAC: Arabic Named Entity Transliteration and Classification Dataset

In this paper, we make freely accessible ANETAC our English-Arabic named...
research
02/28/2020

Automatic Section Recognition in Obituaries

Obituaries contain information about people's values across times and cu...

Please sign up or login with your details

Forgot password? Click here to reset