Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization

11/01/2020
by   Badr AlKhamissi, et al.
0

We propose a novel architecture for labelling character sequences that achieves state-of-the-art results on the Tashkeela Arabic diacritization benchmark. The core is a two-level recurrence hierarchy that operates on the word and character levels separately—enabling faster training and inference than comparable traditional models. A cross-level attention module further connects the two, and opens the door for network interpretability. The task module is a softmax classifier that enumerates valid combinations of diacritics. This architecture can be extended with a recurrent decoder that optionally accepts priors from partially diacritized text, which improves results. We employ extra tricks such as sentence dropout and majority voting to further boost the final result. Our best model achieves a WER of 5.34 outperforming the previous state-of-the-art with a 30.56 reduction.

READ FULL TEXT
research
09/18/2020

An Efficient Language-Independent Multi-Font OCR for Arabic Script

Optical Character Recognition (OCR) is the process of extracting digitiz...
research
10/15/2018

Diacritization of Maghrebi Arabic Sub-Dialects

Diacritization process attempt to restore the short vowels in Arabic wri...
research
03/01/2021

Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

In this paper, we tackle the Nuanced Arabic Dialect Identification (NADI...
research
06/20/2020

AraDIC: Arabic Document Classification using Image-Based Character Embeddings and Class-Balanced Loss

Classical and some deep learning techniques for Arabic text classificati...
research
07/27/2018

Improving Neural Sequence Labelling using Additional Linguistic Information

Sequence labelling is the task of assigning categorical labels to a data...
research
02/04/2020

Arabic Diacritic Recovery Using a Feature-Rich biLSTM Model

Diacritics (short vowels) are typically omitted when writing Arabic text...
research
06/06/2023

Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text

Automatic Arabic diacritization is useful in many applications, ranging ...

Please sign up or login with your details

Forgot password? Click here to reset