DeepAI AI Chat
Log In Sign Up

JASMINE: Arabic GPT Models for Few-Shot Learning

by   El Moatez Billah Nagoudi, et al.

Task agnostic generative pretraining (GPT) has recently proved promising for zero- and few-shot learning, gradually diverting attention from the expensive supervised learning paradigm. Although the community is accumulating knowledge as to capabilities of English-language autoregressive models such as GPT-3 adopting this generative approach, scholarship about these models remains acutely Anglocentric. Consequently, the community currently has serious gaps in its understanding of this class of models, their potential, and their societal impacts in diverse settings, linguistic traditions, and cultures. To alleviate this issue for Arabic, a collection of diverse languages and language varieties with more than 400 million population, we introduce JASMINE, a suite of powerful Arabic autoregressive Transformer language models ranging in size between 300 million-13 billion parameters. We pretrain our new models with large amounts of diverse data (400GB of text) from different Arabic varieties and domains. We evaluate JASMINE extensively in both intrinsic and extrinsic settings, using a comprehensive benchmark for zero- and few-shot learning across a wide range of NLP tasks. We also carefully develop and release a novel benchmark for both automated and human evaluation of Arabic autoregressive models focused at investigating potential social biases, harms, and toxicity in these models. We aim to responsibly release our models with interested researchers, along with code for experimenting with them


page 9

page 11


ORCA: A Challenging Benchmark for Arabic Language Understanding

Due to their crucial role in all NLP, several benchmarks have been propo...

Few-shot Learning with Multilingual Language Models

Large-scale autoregressive language models such as GPT-3 are few-shot le...

Dolphin: A Challenging and Diverse Benchmark for Arabic NLG

We present Dolphin, a novel benchmark that addresses the need for an eva...

GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP

The recent emergence of ChatGPT has brought a revolutionary change in th...

TAPE: Assessing Few-shot Russian Language Understanding

Recent advances in zero-shot and few-shot learning have shown promise fo...

mGPT: Few-Shot Learners Go Multilingual

Recent studies report that autoregressive language models can successful...

FLEX: Unifying Evaluation for Few-Shot NLP

Few-shot NLP research is highly active, yet conducted in disjoint resear...