An Empirical Survey of Unsupervised Text Representation Methods on Twitter Data

12/07/2020
by   Lili Wang, et al.
8

The field of NLP has seen unprecedented achievements in recent years. Most notably, with the advent of large-scale pre-trained Transformer-based language models, such as BERT, there has been a noticeable improvement in text representation. It is, however, unclear whether these improvements translate to noisy user-generated text, such as tweets. In this paper, we present an experimental survey of a wide range of well-known text representation techniques for the task of text clustering on noisy Twitter data. Our results indicate that the more advanced models do not necessarily work best on tweets and that more exploration in this area is needed.

READ FULL TEXT
research
09/15/2022

TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations

We present TwHIN-BERT, a multilingual language model trained on in-domai...
research
10/17/2020

TweetBERT: A Pretrained Language Representation Model for Twitter Text Analysis

Twitter is a well-known microblogging social site where users express th...
research
09/21/2021

BERTweetFR : Domain Adaptation of Pre-Trained Language Models for French Tweets

We introduce BERTweetFR, the first large-scale pre-trained language mode...
research
07/02/2021

Language Identification of Hindi-English tweets using code-mixed BERT

Language identification of social media text has been an interesting pro...
research
09/29/2020

Gender prediction using limited Twitter Data

Transformer models have shown impressive performance on a variety of NLP...
research
02/08/2022

TimeLMs: Diachronic Language Models from Twitter

Despite its importance, the time variable has been largely neglected in ...
research
04/01/2021

Famous Companies Use More Letters in Logo:A Large-Scale Analysis of Text Area in Logo

This paper analyzes a large number of logo images from the LLD-logo data...

Please sign up or login with your details

Forgot password? Click here to reset