Fine-Tashkeel: Finetuning Byte-Level Models for Accurate Arabic Text Diacritization

03/25/2023
by   Bashar Al-Rfooh, et al.
0

Most of previous work on learning diacritization of the Arabic language relied on training models from scratch. In this paper, we investigate how to leverage pre-trained language models to learn diacritization. We finetune token-free pre-trained multilingual models (ByT5) to learn to predict and insert missing diacritics in Arabic text, a complex task that requires understanding the sentence semantics and the morphological structure of the tokens. We show that we can achieve state-of-the-art on the diacritization task with minimal amount of training and no feature engineering, reducing WER by 40 in the community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2021

The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

In this paper, we explore the effects of language variants, data sizes, ...
research
05/18/2023

A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model

In this work, we explore Parameter-Efficient-Learning (PEL) techniques t...
research
04/16/2021

LU-BZU at SemEval-2021 Task 2: Word2Vec and Lemma2Vec performance in Arabic Word-in-Context disambiguation

This paper presents a set of experiments to evaluate and compare between...
research
06/06/2023

Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text

Automatic Arabic diacritization is useful in many applications, ranging ...
research
12/31/2020

AraELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding

Advances in English language representation enabled a more sample-effici...
research
10/18/2022

Post-hoc analysis of Arabic transformer models

Arabic is a Semitic language which is widely spoken with many dialects. ...
research
11/08/2022

Parameter and Data Efficient Continual Pre-training for Robustness to Dialectal Variance in Arabic

The use of multilingual language models for tasks in low and high-resour...

Please sign up or login with your details

Forgot password? Click here to reset