Improving Arabic Diacritization by Learning to Diacritize and Translate

09/29/2021
by   Brian Thompson, et al.
0

We propose a novel multitask learning method for diacritization which trains a model to both diacritize and translate. Our method addresses data sparsity by exploiting large, readily available bitext corpora. Furthermore, translation requires implicit linguistic and semantic knowledge, which is helpful for resolving ambiguities in the diacritization task. We apply our method to the Penn Arabic Treebank and report a new state-of-the-art word error rate of 4.79 method and highlight some of the remaining challenges in diacritization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2020

Arabic Diacritic Recovery Using a Feature-Rich biLSTM Model

Diacritics (short vowels) are typically omitted when writing Arabic text...
research
09/07/2015

Exploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical Machine Translation

Statistical machine translation for dialectal Arabic is characterized by...
research
08/06/2015

Using Linguistic Analysis to Translate Arabic Natural Language Queries to SPARQL

The logic-based machine-understandable framework of the Semantic Web oft...
research
08/06/2023

TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties

Large language models (LLMs) finetuned to follow human instructions have...
research
11/29/2022

New Results for the Text Recognition of Arabic Maghribī Manuscripts – Managing an Under-resourced Script

HTR models development has become a conventional step for digital humani...
research
04/11/2018

Problem of Multiple Diacritics Design for Arabic Script

This study focuses on the design of multiple Arabic diacritical marks an...

Please sign up or login with your details

Forgot password? Click here to reset