RUBERT: A Bilingual Roman Urdu BERT Using Cross Lingual Transfer Learning

02/22/2021
by   Usama Khalid, et al.
0

In recent studies, it has been shown that Multilingual language models underperform their monolingual counterparts. It is also a well-known fact that training and maintaining monolingual models for each language is a costly and time-consuming process. Roman Urdu is a resource-starved language used popularly on social media platforms and chat apps. In this research, we propose a novel dataset of scraped tweets containing 54M tokens and 3M sentences. Additionally, we also propose RUBERT a bilingual Roman Urdu model created by additional pretraining of English BERT. We compare its performance with a monolingual Roman Urdu BERT trained from scratch and a multilingual Roman Urdu BERT created by additional pretraining of Multilingual BERT. We show through our experiments that additional pretraining of the English BERT produces the most notable performance improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2020

FinEst BERT and CroSloEngual BERT: less is more in multilingual models

Large pretrained masked language models have become state-of-the-art sol...
research
12/20/2021

Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages

Large pretrained masked language models have become state-of-the-art sol...
research
06/13/2023

Monolingual and Cross-Lingual Knowledge Transfer for Topic Classification

This article investigates the knowledge transfer from the RuQTopics data...
research
08/02/2021

Transfer Learning for Mining Feature Requests and Bug Reports from Tweets and App Store Reviews

Identifying feature requests and bug reports in user comments holds grea...
research
11/08/2021

Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

The popularity of social media has created problems such as hate speech ...
research
03/10/2022

A new approach to calculating BERTScore for automatic assessment of translation quality

The study of the applicability of the BERTScore metric was conducted to ...
research
08/31/2021

Monolingual versus Multilingual BERTology for Vietnamese Extractive Multi-Document Summarization

Recent researches have demonstrated that BERT shows potential in a wide ...

Please sign up or login with your details

Forgot password? Click here to reset