AraBERT: Transformer-based Model for Arabic Language Understanding

02/28/2020
by   Wissam Antoun, et al.
0

The Arabic language is a morphologically rich and complex language with relatively little resources and a less explored syntax compared to English. Given these limitations, tasks like Sentiment Analysis (SA), Named Entity Recognition (NER), and Question Answering (QA), have proven to be very challenging to tackle. Recently, with the surge of transformers based models, language-specific BERT based models proved to have a very efficient understanding of languages, provided they are pre-trained on a very large corpus. Such models were able to set new standards and achieve state-of-the-art results for most NLP tasks. In this paper, we pre-trained BERT specifically for the Arabic language in the pursuit of achieving the same success that BERT did for the English language. We then compare the performance of AraBERT with multilingual BERT provided by Google and other state-of-the-art approaches. The results of the conducted experiments show that the newly developed AraBERT achieved state-of-the-art results on most tested tasks. The pretrained araBERT models are publicly available on hoping to encourage research and applications for Arabic NLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2020

ParsBERT: Transformer-based Model for Persian Language Understanding

The surge of pre-trained language models has begun a new era in the fiel...
research
04/30/2020

A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks

The Arabic language is a morphological rich language, posing many challe...
research
12/31/2020

AraGPT2: Pre-Trained Transformer for Arabic Language Generation

Recently, pretrained transformer-based architectures have proven to be v...
research
05/04/2021

HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish

BERT-based models are currently used for solving nearly all Natural Lang...
research
01/19/2022

Interpreting Arabic Transformer Models

Arabic is a Semitic language which is widely spoken with many dialects. ...
research
05/24/2023

Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation

Grammatical error correction (GEC) is a well-explored problem in English...
research
07/03/2020

Playing with Words at the National Library of Sweden – Making a Swedish BERT

This paper introduces the Swedish BERT ("KB-BERT") developed by the KBLa...

Please sign up or login with your details

Forgot password? Click here to reset