Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects

10/13/2021
by   Go Inoue, et al.
0

We present state-of-the-art results on morphosyntactic tagging across different varieties of Arabic using fine-tuned pre-trained transformer language models. Our models consistently outperform existing systems in Modern Standard Arabic and all the Arabic dialects we study, achieving 2.6 improvement over the previous state-of-the-art in Modern Standard Arabic, 2.8 in Gulf, 1.6 setups for fine-tuning pre-trained transformer language models, including training data size, the use of external linguistic resources, and the use of annotated data from other dialects in a low-resource scenario. Our results show that strategic fine-tuning using datasets from other high-resource dialects is beneficial for a low-resource dialect. Additionally, we show that high-quality morphological analyzers as external linguistic resources are beneficial especially in low-resource settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2021

The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

In this paper, we explore the effects of language variants, data sizes, ...
research
06/30/2023

Towards Improving the Performance of Pre-Trained Speech Models for Low-Resource Languages Through Lateral Inhibition

With the rise of bidirectional encoder representations from Transformer ...
research
04/14/2021

Zero-Resource Multi-Dialectal Arabic Natural Language Understanding

A reasonable amount of annotated data is required for fine-tuning pre-tr...
research
01/12/2021

Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling

A sufficient amount of annotated data is usually required to fine-tune p...
research
08/08/2023

ChatGPT for Arabic Grammatical Error Correction

Recently, large language models (LLMs) fine-tuned to follow human instru...
research
10/21/2019

Constructing Artificial Data for Fine-tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation

Biomedical text tagging systems are plagued by the dearth of labeled tra...
research
05/05/2022

Quantifying Language Variation Acoustically with Few Resources

Deep acoustic models represent linguistic information based on massive a...

Please sign up or login with your details

Forgot password? Click here to reset