Transliteration of Judeo-Arabic Texts into Arabic Script Using Recurrent Neural Networks

04/23/2020
by   Nachum Dershowitz, et al.
0

Many of the great Jewish works of the Middle Ages were written in Judeo-Arabic, a Jewish branch of the Arabic language family that incorporates the Hebrew script as its writing system. In this work we are trying to train a model that will automatically transliterate Judeo-Arabic into Arabic script; thus we aspire to enable Arabic readers to access those writings. We adopt a recurrent neural network (RNN) approach to the problem, applying connectionist temporal classification loss to deal with unequal input/output lengths. This choice obligates adjustments, termed doubling, in the training data to avoid input sequences that are shorter than their corresponding outputs. We also utilize a pretraining stage with a different loss function to help the network converge. Furthermore, since only a single source of parallel text was available for training, we examine the possibility of generating data synthetically from other Arabic original text from the time in question, leveraging the fact that, though the convention for mapping applied by the Judeo-Arabic author has a one-to-many relation from Judeo-Arabic to Arabic, its reverse (from Arabic to Judeo-Arabic) is a proper function. By this we attempt to train a model that has the capability to memorize words in the output language, and that also utilizes the context for distinguishing ambiguities in the transliteration. We examine this ability by testing on shuffled data that lacks context. We obtain an improvement over the baseline results (9.5 achieving 2 to 2.5

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2021

Transliterating Kurdish texts in Latin into Persian-Arabic script

Kurdish is written in different scripts. The two most popular scripts ar...
research
01/20/2013

Recurrent Neural Network Method in Arabic Words Recognition System

The recognition of unconstrained handwriting continues to be a difficult...
research
05/07/2019

Learning meters of Arabic and English poems with Recurrent Neural Networks: a step forward for language understanding and synthesis

Recognizing a piece of writing as a poem or prose is usually easy for th...
research
06/06/2023

Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text

Automatic Arabic diacritization is useful in many applications, ranging ...
research
01/22/2021

BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets

During the last two decades, we have progressively turned to the Interne...
research
12/15/2014

CITlab ARGUS for Arabic Handwriting

In the recent years it turned out that multidimensional recurrent neural...
research
04/11/2018

Problem of Multiple Diacritics Design for Arabic Script

This study focuses on the design of multiple Arabic diacritical marks an...

Please sign up or login with your details

Forgot password? Click here to reset