Studying the History of the Arabic Language: Language Technology and a Large-Scale Historical Corpus

09/11/2018
by   Yonatan Belinkov, et al.
0

Arabic is a widely-spoken language with a long and rich history, but existing corpora and language technology focus mostly on modern Arabic and its varieties. Therefore, studying the history of the language has so far been mostly limited to manual analyses on a small scale. In this work, we present a large-scale historical corpus of the written Arabic language, spanning 1400 years. We describe our efforts to clean and process this corpus using Arabic NLP tools, including the identification of reused text. We study the history of the Arabic language using a novel automatic periodization algorithm, as well as other techniques. Our findings confirm the established division of written Arabic into Modern Standard and Classical Arabic, and confirm other established periodizations, while suggesting that written Arabic may be divisible into still further periods of development.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2016

Shamela: A Large-Scale Historical Arabic Corpus

Arabic is a widely-spoken language with a rich and long history spanning...
research
05/19/2022

Curras + Baladi: Towards a Levantine Corpus

The processing of the Arabic language is a complex field of research. Th...
research
03/20/2020

TArC: Incrementally and Semi-Automatically Collecting a Tunisian Arabish Corpus

This article describes the constitution process of the first morpho-synt...
research
01/23/2022

A Large and Diverse Arabic Corpus for Language Modeling

Language models (LMs) have introduced a major paradigm shift in Natural ...
research
06/18/2022

MANorm: A Normalization Dictionary for Moroccan Arabic Dialect Written in Latin Script

Social media user-generated text is actually the main resource for many ...
research
04/25/2019

Arabic Text Diacritization Using Deep Neural Networks

Diacritization of Arabic text is both an interesting and a challenging p...
research
07/12/2023

Ashaar: Automatic Analysis and Generation of Arabic Poetry Using Deep Learning Approaches

Poetry holds immense significance within the cultural and traditional fa...

Please sign up or login with your details

Forgot password? Click here to reset