ParsBERT: Transformer-based Model for Persian Language Understanding

05/26/2020
by   Mehrdad Farahani, et al.
0

The surge of pre-trained language models has begun a new era in the field of Natural Language Processing (NLP) by allowing us to build powerful language models. Among these models, Transformer-based models such as BERT have become increasingly popular due to their state-of-the-art performance. However, these models are usually focused on English, leaving other languages to multilingual models with limited resources. This paper proposes a monolingual BERT for the Persian language (ParsBERT), which shows its state-of-the-art performance compared to other architectures and multilingual models. Also, since the amount of data available for NLP tasks in Persian is very restricted, a massive dataset for different NLP tasks as well as pre-training the model is composed. ParsBERT obtains higher scores in all datasets, including existing ones as well as composed ones and improves the state-of-the-art performance by outperforming both multilingual BERT and other prior works in Sentiment Analysis, Text Classification and Named Entity Recognition tasks.

READ FULL TEXT

page 7

page 8

research
12/19/2019

BERTje: A Dutch BERT Model

The transformer-based pre-trained language model BERT has helped to impr...
research
02/28/2020

AraBERT: Transformer-based Model for Arabic Language Understanding

The Arabic language is a morphologically rich and complex language with ...
research
09/19/2023

Mixed-Distil-BERT: Code-mixed Language Modeling for Bangla, English, and Hindi

One of the most popular downstream tasks in the field of Natural Languag...
research
04/18/2023

HeRo: RoBERTa and Longformer Hebrew Language Models

In this paper, we fill in an existing gap in resources available to the ...
research
03/21/2021

TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

Various robustness evaluation methodologies from different perspectives ...
research
03/13/2023

Transformer-based approaches to Sentiment Detection

The use of transfer learning methods is largely responsible for the pres...
research
04/08/2022

Improving Tokenisation by Alternative Treatment of Spaces

Tokenisation is the first step in almost all NLP tasks, and state-of-the...

Please sign up or login with your details

Forgot password? Click here to reset