DeepAI AI Chat
Log In Sign Up

ParsBERT: Transformer-based Model for Persian Language Understanding

05/26/2020
by   Mehrdad Farahani, et al.
Shahed University
iau-tnb
qut
0

The surge of pre-trained language models has begun a new era in the field of Natural Language Processing (NLP) by allowing us to build powerful language models. Among these models, Transformer-based models such as BERT have become increasingly popular due to their state-of-the-art performance. However, these models are usually focused on English, leaving other languages to multilingual models with limited resources. This paper proposes a monolingual BERT for the Persian language (ParsBERT), which shows its state-of-the-art performance compared to other architectures and multilingual models. Also, since the amount of data available for NLP tasks in Persian is very restricted, a massive dataset for different NLP tasks as well as pre-training the model is composed. ParsBERT obtains higher scores in all datasets, including existing ones as well as composed ones and improves the state-of-the-art performance by outperforming both multilingual BERT and other prior works in Sentiment Analysis, Text Classification and Named Entity Recognition tasks.

READ FULL TEXT

page 7

page 8

12/19/2019

BERTje: A Dutch BERT Model

The transformer-based pre-trained language model BERT has helped to impr...
02/28/2020

AraBERT: Transformer-based Model for Arabic Language Understanding

The Arabic language is a morphologically rich and complex language with ...
04/18/2023

HeRo: RoBERTa and Longformer Hebrew Language Models

In this paper, we fill in an existing gap in resources available to the ...
03/21/2021

TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

Various robustness evaluation methodologies from different perspectives ...
03/13/2023

Transformer-based approaches to Sentiment Detection

The use of transfer learning methods is largely responsible for the pres...
04/08/2022

Improving Tokenisation by Alternative Treatment of Spaces

Tokenisation is the first step in almost all NLP tasks, and state-of-the...
05/19/2022

Overcoming Language Disparity in Online Content Classification with Multimodal Learning

Advances in Natural Language Processing (NLP) have revolutionized the wa...

Code Repositories

parsbert

🤗 ParsBERT: Transformer-based Model for Persian Language Understanding


view repo