Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text

06/06/2023
by   Parnia Bahar, et al.
0

Automatic Arabic diacritization is useful in many applications, ranging from reading support for language learners to accurate pronunciation predictor for downstream tasks like speech synthesis. While most of the previous works focused on models that operate on raw non-diacritized text, production systems can gain accuracy by first letting humans partly annotate ambiguous words. In this paper, we propose 2SDiac, a multi-source model that can effectively support optional diacritics in input to inform all predictions. We also introduce Guided Learning, a training scheme to leverage given diacritics in input with different levels of random masking. We show that the provided hints during test affect more output positions than those annotated. Moreover, experiments on two common benchmarks show that our approach i) greatly outperforms the baseline also when evaluated on non-diacritized text; and ii) achieves state-of-the-art results while reducing the parameter count by over 60

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2019

Arabic Text Diacritization Using Deep Neural Networks

Diacritization of Arabic text is both an interesting and a challenging p...
research
03/25/2023

Fine-Tashkeel: Finetuning Byte-Level Models for Accurate Arabic Text Diacritization

Most of previous work on learning diacritization of the Arabic language ...
research
04/23/2020

Transliteration of Judeo-Arabic Texts into Arabic Script Using Recurrent Neural Networks

Many of the great Jewish works of the Middle Ages were written in Judeo-...
research
11/07/2017

Unconstrained Scene Text and Video Text Recognition for Arabic Script

Building robust recognizers for Arabic has always been challenging. We d...
research
06/17/2021

A Deep Belief Network Classification Approach for Automatic Diacritization of Arabic Text

Deep learning has emerged as a new area of machine learning research. It...
research
11/01/2020

Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization

We propose a novel architecture for labelling character sequences that a...
research
06/09/2022

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

To build an artificial neural network like the biological intelligence s...

Please sign up or login with your details

Forgot password? Click here to reset