DeepAI AI Chat
Log In Sign Up

Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling

by   Muhammad Khalifa, et al.

A sufficient amount of annotated data is usually required to fine-tune pre-trained language models for downstream tasks. Unfortunately, attaining labeled data can be costly, especially for multiple language varieties and dialects. We propose to self-train pre-trained language models in zero- and few-shot scenarios to improve performance on data-scarce varieties using only resources from data-rich ones. We demonstrate the utility of our approach in the context of Arabic sequence labeling by using a language model fine-tuned on Modern Standard Arabic (MSA) only to predict named entities (NE) and part-of-speech (POS) tags on several dialectal Arabic (DA) varieties. We show that self-training is indeed powerful, improving zero-shot MSA-to-DA transfer by as large as ˜10% F_1 (NER) and 2% accuracy (POS tagging). We acquire even better performance in few-shot scenarios with limited amounts of labeled data. We conduct an ablation study and show that the performance boost observed directly results from the unlabeled DA examples used for self-training. Our work opens up opportunities for developing DA models exploiting only MSA resources and it can be extended to other languages and tasks. Our code and fine-tuned models can be accessed at


page 1

page 2

page 3

page 4


Zero-Resource Multi-Dialectal Arabic Natural Language Understanding

A reasonable amount of annotated data is required for fine-tuning pre-tr...

Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects

We present state-of-the-art results on morphosyntactic tagging across di...

A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks

The Arabic language is a morphological rich language, posing many challe...

WeLM: A Well-Read Pre-trained Language Model for Chinese

Large Language Models pre-trained with self-supervised learning have dem...

To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment

There has been mounting evidence that pretrained language models fine-tu...

Zero-Shot Dialogue Disentanglement by Self-Supervised Entangled Response Selection

Dialogue disentanglement aims to group utterances in a long and multi-pa...

On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex

Semantic parsing is a technique aimed at constructing a structured repre...