DeepAI AI Chat
Log In Sign Up

AraBERT: Transformer-based Model for Arabic Language Understanding

02/28/2020
by   Wissam Antoun, et al.
American University of Beirut
0

The Arabic language is a morphologically rich and complex language with relatively little resources and a less explored syntax compared to English. Given these limitations, tasks like Sentiment Analysis (SA), Named Entity Recognition (NER), and Question Answering (QA), have proven to be very challenging to tackle. Recently, with the surge of transformers based models, language-specific BERT based models proved to have a very efficient understanding of languages, provided they are pre-trained on a very large corpus. Such models were able to set new standards and achieve state-of-the-art results for most NLP tasks. In this paper, we pre-trained BERT specifically for the Arabic language in the pursuit of achieving the same success that BERT did for the English language. We then compare the performance of AraBERT with multilingual BERT provided by Google and other state-of-the-art approaches. The results of the conducted experiments show that the newly developed AraBERT achieved state-of-the-art results on most tested tasks. The pretrained araBERT models are publicly available on hoping to encourage research and applications for Arabic NLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/26/2020

ParsBERT: Transformer-based Model for Persian Language Understanding

The surge of pre-trained language models has begun a new era in the fiel...
04/30/2020

A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks

The Arabic language is a morphological rich language, posing many challe...
12/31/2020

AraGPT2: Pre-Trained Transformer for Arabic Language Generation

Recently, pretrained transformer-based architectures have proven to be v...
05/04/2021

HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish

BERT-based models are currently used for solving nearly all Natural Lang...
01/19/2022

Interpreting Arabic Transformer Models

Arabic is a Semitic language which is widely spoken with many dialects. ...
11/19/2019

Towards Lingua Franca Named Entity Recognition with BERT

Information extraction is an important task in NLP, enabling the automat...
02/07/2023

A Survey on Arabic Named Entity Recognition: Past, Recent Advances, and Future Trends

As more and more Arabic texts emerged on the Internet, extracting import...