A Deep Belief Network Classification Approach for Automatic Diacritization of Arabic Text

06/17/2021
by   Omar Al-Kadi, et al.
0

Deep learning has emerged as a new area of machine learning research. It is an approach that can learn features and hierarchical representation purely from data and has been successfully applied to several fields such as images, sounds, text and motion. The techniques developed from deep learning research have already been impacting the research on Natural Language Processing (NLP). Arabic diacritics are vital components of Arabic text that remove ambiguity from words and reinforce the meaning of the text. In this paper, a Deep Belief Network (DBN) is used as a diacritizer for Arabic text. DBN is an algorithm among deep learning that has recently proved to be very effective for a variety of machine learning problems. We evaluate the use of DBNs as classifiers in automatic Arabic text diacritization. The DBN was trained to individually classify each input letter with the corresponding diacritized version. Experiments were conducted using two benchmark datasets, the LDC ATB3 and Tashkeela. Our best settings achieve a DER and WER of 2.21% and 6.73%, receptively, on the ATB3 benchmark with an improvement of 26% over the best published results. On the Tashkeela benchmark, our system continues to achieve high accuracy with a DER of 1.79% and 14% improvement.

READ FULL TEXT

page 2

page 9

page 15

page 16

page 17

page 18

page 20

page 25

research
09/26/2020

Automatic Arabic Dialect Identification Systems for Written Texts: A Survey

Arabic dialect identification is a specific task of natural language pro...
research
05/12/2023

Towards Transliteration between Sindhi Scripts from Devanagari to Perso-Arabic

In this paper, we have shown a script conversion (transliteration) techn...
research
09/28/2022

ArNLI: Arabic Natural Language Inference for Entailment and Contradiction Detection

Natural Language Inference (NLI) is a hot topic research in natural lang...
research
06/20/2020

AraDIC: Arabic Document Classification using Image-Based Character Embeddings and Class-Balanced Loss

Classical and some deep learning techniques for Arabic text classificati...
research
02/14/2018

Authorship Attribution Using the Chaos Game Representation

The Chaos Game Representation, a method for creating images from nucleot...
research
06/06/2023

Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text

Automatic Arabic diacritization is useful in many applications, ranging ...
research
05/07/2019

Learning meters of Arabic and English poems with Recurrent Neural Networks: a step forward for language understanding and synthesis

Recognizing a piece of writing as a poem or prose is usually easy for th...

Please sign up or login with your details

Forgot password? Click here to reset