DeepAI AI Chat
Log In Sign Up

ParsBERT: Transformer-based Model for Persian Language Understanding

by   Mehrdad Farahani, et al.
Shahed University

The surge of pre-trained language models has begun a new era in the field of Natural Language Processing (NLP) by allowing us to build powerful language models. Among these models, Transformer-based models such as BERT have become increasingly popular due to their state-of-the-art performance. However, these models are usually focused on English, leaving other languages to multilingual models with limited resources. This paper proposes a monolingual BERT for the Persian language (ParsBERT), which shows its state-of-the-art performance compared to other architectures and multilingual models. Also, since the amount of data available for NLP tasks in Persian is very restricted, a massive dataset for different NLP tasks as well as pre-training the model is composed. ParsBERT obtains higher scores in all datasets, including existing ones as well as composed ones and improves the state-of-the-art performance by outperforming both multilingual BERT and other prior works in Sentiment Analysis, Text Classification and Named Entity Recognition tasks.


page 7

page 8


BERTje: A Dutch BERT Model

The transformer-based pre-trained language model BERT has helped to impr...

AraBERT: Transformer-based Model for Arabic Language Understanding

The Arabic language is a morphologically rich and complex language with ...

HeRo: RoBERTa and Longformer Hebrew Language Models

In this paper, we fill in an existing gap in resources available to the ...

TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

Various robustness evaluation methodologies from different perspectives ...

Transformer-based approaches to Sentiment Detection

The use of transfer learning methods is largely responsible for the pres...

Improving Tokenisation by Alternative Treatment of Spaces

Tokenisation is the first step in almost all NLP tasks, and state-of-the...

Overcoming Language Disparity in Online Content Classification with Multimodal Learning

Advances in Natural Language Processing (NLP) have revolutionized the wa...

Code Repositories


🤗 ParsBERT: Transformer-based Model for Persian Language Understanding

view repo