Supporting Undotted Arabic with Pre-trained Language Models

11/18/2021
by   Aviad Rom, et al.
1

We observe a recent behaviour on social media, in which users intentionally remove consonantal dots from Arabic letters, in order to bypass content-classification algorithms. Content classification is typically done by fine-tuning pre-trained language models, which have been recently employed by many natural-language-processing applications. In this work we study the effect of applying pre-trained Arabic language models on "undotted" Arabic texts. We suggest several ways of supporting undotted texts with pre-trained models, without additional training, and measure their performance on two Arabic natural-language-processing downstream tasks. The results are encouraging; in one of the tasks our method shows nearly perfect performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2023

AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

Developing monolingual large Pre-trained Language Models (PLMs) is shown...
research
03/11/2021

The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

In this paper, we explore the effects of language variants, data sizes, ...
research
05/18/2023

A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model

In this work, we explore Parameter-Efficient-Learning (PEL) techniques t...
research
04/06/2023

Investigating Chain-of-thought with ChatGPT for Stance Detection on Social Media

Stance detection predicts attitudes towards targets in texts and has gai...
research
06/29/2021

New Arabic Medical Dataset for Diseases Classification

The Arabic language suffers from a great shortage of datasets suitable f...
research
01/22/2021

BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets

During the last two decades, we have progressively turned to the Interne...
research
12/27/2020

ARBERT MARBERT: Deep Bidirectional Transformers for Arabic

Masked language models (MLM) have become an integral part of many natura...

Please sign up or login with your details

Forgot password? Click here to reset