AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

06/11/2023
by   Asaad AlGhamdi, et al.
0

Developing monolingual large Pre-trained Language Models (PLMs) is shown to be very successful in handling different tasks in Natural Language Processing (NLP). In this work, we present AraMUS, the largest Arabic PLM with 11B parameters trained on 529GB of high-quality Arabic textual data. AraMUS achieves state-of-the-art performances on a diverse set of Arabic classification and generative tasks. Moreover, AraMUS shows impressive few-shot learning abilities compared with the best existing Arabic PLMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2021

Supporting Undotted Arabic with Pre-trained Language Models

We observe a recent behaviour on social media, in which users intentiona...
research
01/23/2022

A Large and Diverse Arabic Corpus for Language Modeling

Language models (LMs) have introduced a major paradigm shift in Natural ...
research
02/07/2021

An open access NLP dataset for Arabic dialects : Data collection, labeling, and model construction

Natural Language Processing (NLP) is today a very active field of resear...
research
05/19/2023

A Sequence-to-Sequence Approach for Arabic Pronoun Resolution

This paper proposes a sequence-to-sequence learning approach for Arabic ...
research
12/27/2020

ARBERT MARBERT: Deep Bidirectional Transformers for Arabic

Masked language models (MLM) have become an integral part of many natura...
research
07/10/2012

Arabic CALL system based on pedagogically indexed text

This article introduces the benefits of using computer as a tool for for...
research
07/11/2023

Objaverse-XL: A Universe of 10M+ 3D Objects

Natural language processing and 2D vision models have attained remarkabl...

Please sign up or login with your details

Forgot password? Click here to reset