DeepAI AI Chat
Log In Sign Up

Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects

by   Go Inoue, et al.

We present state-of-the-art results on morphosyntactic tagging across different varieties of Arabic using fine-tuned pre-trained transformer language models. Our models consistently outperform existing systems in Modern Standard Arabic and all the Arabic dialects we study, achieving 2.6 improvement over the previous state-of-the-art in Modern Standard Arabic, 2.8 in Gulf, 1.6 setups for fine-tuning pre-trained transformer language models, including training data size, the use of external linguistic resources, and the use of annotated data from other dialects in a low-resource scenario. Our results show that strategic fine-tuning using datasets from other high-resource dialects is beneficial for a low-resource dialect. Additionally, we show that high-quality morphological analyzers as external linguistic resources are beneficial especially in low-resource settings.


page 1

page 2

page 3

page 4


The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

In this paper, we explore the effects of language variants, data sizes, ...

Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning

Recently, leveraging pre-trained Transformer based language models in do...

Zero-Resource Multi-Dialectal Arabic Natural Language Understanding

A reasonable amount of annotated data is required for fine-tuning pre-tr...

Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling

A sufficient amount of annotated data is usually required to fine-tune p...

Quantifying Language Variation Acoustically with Few Resources

Deep acoustic models represent linguistic information based on massive a...

Evaluating Prompt-based Question Answering for Object Prediction in the Open Research Knowledge Graph

There have been many recent investigations into prompt-based training of...