RoBERTweet: A BERT Language Model for Romanian Tweets

06/11/2023
by   Iulian-Marius Tăiatu, et al.
0

Developing natural language processing (NLP) systems for social media analysis remains an important topic in artificial intelligence research. This article introduces RoBERTweet, the first Transformer architecture trained on Romanian tweets. Our RoBERTweet comes in two versions, following the base and large architectures of BERT. The corpus used for pre-training the models represents a novelty for the Romanian NLP community and consists of all tweets collected from 2008 to 2022. Experiments show that RoBERTweet models outperform the previous general-domain Romanian and multilingual language models on three NLP tasks with tweet inputs: emotion detection, sexist language identification, and named entity recognition. We make our models and the newly created corpus of Romanian tweets freely available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2021

BERTweetFR : Domain Adaptation of Pre-Trained Language Models for French Tweets

We introduce BERTweetFR, the first large-scale pre-trained language mode...
research
04/02/2023

MMT: A Multilingual and Multi-Topic Indian Social Media Dataset

Social media plays a significant role in cross-cultural communication. A...
research
05/20/2020

BERTweet: A pre-trained language model for English Tweets

We present BERTweet, the first public large-scale pre-trained language m...
research
06/25/2023

Revolutionizing Cyber Threat Detection with Large Language Models

Natural Language Processing (NLP) domain is experiencing a revolution du...
research
04/20/2020

The Panacea Threat Intelligence and Active Defense Platform

We describe Panacea, a system that supports natural language processing ...
research
08/23/2023

Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models

Foundational Language Models (FLMs) have advanced natural language proce...
research
09/10/2021

FBERT: A Neural Transformer for Identifying Offensive Content

Transformer-based models such as BERT, XLNET, and XLM-R have achieved st...

Please sign up or login with your details

Forgot password? Click here to reset