Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM

08/19/2017
by   Mohamed Eldesouki, et al.
0

Arabic word segmentation is essential for a variety of NLP applications such as machine translation and information retrieval. Segmentation entails breaking words into their constituent stems, affixes and clitics. In this paper, we compare two approaches for segmenting four major Arabic dialects using only several thousand training examples for each dialect. The two approaches involve posing the problem as a ranking problem, where an SVM ranker picks the best segmentation, and as a sequence labeling problem, where a bi-LSTM RNN coupled with CRF determines where best to segment words. We are able to achieve solid segmentation results for all dialects using rather limited training data. We also show that employing Modern Standard Arabic data for domain adaptation and assuming context independence improve overall results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2018

Diacritization of Maghrebi Arabic Sub-Dialects

Diacritization process attempt to restore the short vowels in Arabic wri...
research
11/14/2018

Melodic Phrase Segmentation By Deep Neural Networks

Automated melodic phrase detection and segmentation is a classical task ...
research
09/07/2015

Exploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical Machine Translation

Statistical machine translation for dialectal Arabic is characterized by...
research
02/07/2017

Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study

The effectiveness of three stop words lists for Arabic Information Retri...
research
11/15/2019

An Accuracy-Enhanced Stemming Algorithm for Arabic Information Retrieval

This paper provides a method for indexing and retrieving Arabic texts, b...
research
11/16/2019

Contribution au Niveau de l'Approche Indirecte à Base de Transfert dans la Traduction Automatique

In this thesis, we address several important issues concerning the morph...
research
04/11/2018

Aesthetical Attributes for Segmenting Arabic Word

The connected allograph representing calligraphic Arabic word does not a...

Please sign up or login with your details

Forgot password? Click here to reset