Guidelines and Annotation Framework for Arabic Author Profiling

08/23/2018
by   Wajdi Zaghouani, et al.
0

In this paper, we present the annotation pipeline and the guidelines we wrote as part of an effort to create a large manually annotated Arabic author profiling dataset from various social media sources covering 16 Arabic countries and 11 dialectal regions. The target size of the annotated ARAP-Tweet corpus is more than 2.4 million words. We illustrate and summarize our general and dialect-specific guidelines for each of the dialectal regions selected. We also present the annotation framework and logistics. We control the annotation quality frequently by computing the inter-annotator agreement during the annotation process. Finally, we describe the issues encountered during the annotation phase, especially those related to the peculiarities of Arabic dialectal varieties as used in social media.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

08/23/2018

Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification

In this paper, we present Arap-Tweet, which is a large-scale and multi-d...
09/28/2019

Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data

We present our effort to create a large Multi-Layered representational r...
03/17/2022

Towards Responsible Natural Language Annotation for the Varieties of Arabic

When building NLP models, there is a tendency to aim for broader coverag...
08/15/2018

SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis

Data annotation is an important but time-consuming and costly procedure....
05/27/2021

Investigating label suggestions for opinion mining in German Covid-19 social media

This work investigates the use of interactively updated label suggestion...
10/13/2021

Masader: Metadata Sourcing for Arabic Text and Speech Data Resources

The NLP pipeline has evolved dramatically in the last few years. The fir...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.