ARBERT MARBERT: Deep Bidirectional Transformers for Arabic

by   Muhammad Abdul-Mageed, et al.

Masked language models (MLM) have become an integral part of many natural language processing systems. Although multilingual MLMs have been introduced to serve many languages, these have limitations as to their capacity and the size and diversity of non-English data they are pre-trained on. In this work, we remedy these issues for Arabic by introducing two powerful deep bidirectional transformer-based models, ARBERT and MARBERT, that have superior performance to all existing models. To evaluate our models, we propose ArBench, a new benchmark for multi-dialectal Arabic language understanding. ArBench is built using 41 datasets targeting 5 different tasks/task clusters, allowing us to offer a series of standardized experiments under rich conditions. When fine-tuned on ArBench, ARBERT and MARBERT collectively achieve new SOTA with sizeable margins compared to all existing models such as mBERT, XLM-R (Base and Large), and AraBERT on 37 out of 45 classification tasks on the 41 datasets (



There are no comments yet.



Supporting Undotted Arabic with Pre-trained Language Models

We observe a recent behaviour on social media, in which users intentiona...

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Pre-trained transformers are now the de facto models in Natural Language...

Pre-trained Transformer-Based Approach for Arabic Question Answering : A Comparative Study

Question answering(QA) is one of the most challenging yet widely investi...

A Survey of Arabic Dialogues Understanding for Spontaneous Dialogues and Instant Message

Building dialogues systems interaction has recently gained considerable ...

Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling

A sufficient amount of annotated data is usually required to fine-tune p...

KLEJ: Comprehensive Benchmark for Polish Language Understanding

In recent years, a series of Transformer-based models unlocked major imp...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.