ARBERT MARBERT: Deep Bidirectional Transformers for Arabic

12/27/2020
by   Muhammad Abdul-Mageed, et al.
0

Masked language models (MLM) have become an integral part of many natural language processing systems. Although multilingual MLMs have been introduced to serve many languages, these have limitations as to their capacity and the size and diversity of non-English data they are pre-trained on. In this work, we remedy these issues for Arabic by introducing two powerful deep bidirectional transformer-based models, ARBERT and MARBERT, that have superior performance to all existing models. To evaluate our models, we propose ArBench, a new benchmark for multi-dialectal Arabic language understanding. ArBench is built using 41 datasets targeting 5 different tasks/task clusters, allowing us to offer a series of standardized experiments under rich conditions. When fine-tuned on ArBench, ARBERT and MARBERT collectively achieve new SOTA with sizeable margins compared to all existing models such as mBERT, XLM-R (Base and Large), and AraBERT on 37 out of 45 classification tasks on the 41 datasets (

READ FULL TEXT
research
12/21/2022

ORCA: A Challenging Benchmark for Arabic Language Understanding

Due to their crucial role in all NLP, several benchmarks have been propo...
research
06/16/2022

CS-UM6P at SemEval-2022 Task 6: Transformer-based Models for Intended Sarcasm Detection in English and Arabic

Sarcasm is a form of figurative language where the intended meaning of a...
research
06/11/2023

AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

Developing monolingual large Pre-trained Language Models (PLMs) is shown...
research
11/18/2021

Supporting Undotted Arabic with Pre-trained Language Models

We observe a recent behaviour on social media, in which users intentiona...
research
08/08/2023

ChatGPT for Arabic Grammatical Error Correction

Recently, large language models (LLMs) fine-tuned to follow human instru...
research
04/18/2022

Exploring Dimensionality Reduction Techniques in Multilingual Transformers

Both in scientific literature and in industry,, Semantic and context-awa...

Please sign up or login with your details

Forgot password? Click here to reset