TweetBERT: A Pretrained Language Representation Model for Twitter Text Analysis

10/17/2020
by   Mohiuddin Md Abdul Qudar, et al.
0

Twitter is a well-known microblogging social site where users express their views and opinions in real-time. As a result, tweets tend to contain valuable information. With the advancements of deep learning in the domain of natural language processing, extracting meaningful information from tweets has become a growing interest among natural language researchers. Applying existing language representation models to extract information from Twitter does not often produce good results. Moreover, there is no existing language representation models for text analysis specific to the social media domain. Hence, in this article, we introduce two TweetBERT models, which are domain specific language presentation models, pre-trained on millions of tweets. We show that the TweetBERT models significantly outperform the traditional BERT models in Twitter text mining tasks by more than 7 provide an extensive analysis by evaluating seven BERT models on 31 different datasets. Our results validate our hypothesis that continuously training language models on twitter corpus help performance with Twitter.

READ FULL TEXT
research
09/15/2022

TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations

We present TwHIN-BERT, a multilingual language model trained on in-domai...
research
05/15/2020

COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter

In this work, we release COVID-Twitter-BERT (CT-BERT), a transformer-bas...
research
10/12/2022

Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

Norwegian Twitter data poses an interesting challenge for Natural Langua...
research
10/23/2020

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

The experimental landscape in natural language processing for social med...
research
03/14/2023

Geolocation Predicting of Tweets Using BERT-Based Models

This research is aimed to solve the tweet/user geolocation prediction ta...
research
12/07/2020

An Empirical Survey of Unsupervised Text Representation Methods on Twitter Data

The field of NLP has seen unprecedented achievements in recent years. Mo...
research
08/16/2023

Sarcasm Detection in a Disaster Context

During natural disasters, people often use social media platforms such a...

Please sign up or login with your details

Forgot password? Click here to reset