Improved Topic modeling in Twitter through Community Pooling

12/20/2021
by   Federico Albanese, et al.
0

Social networks play a fundamental role in propagation of information and news. Characterizing the content of the messages becomes vital for different tasks, like breaking news detection, personalized message recommendation, fake users detection, information flow characterization and others. However, Twitter posts are short and often less coherent than other text documents, which makes it challenging to apply text mining algorithms to these datasets efficiently. Tweet-pooling (aggregating tweets into longer documents) has been shown to improve automatic topic decomposition, but the performance achieved in this task varies depending on the pooling method. In this paper, we propose a new pooling scheme for topic modeling in Twitter, which groups tweets whose authors belong to the same community (group of users who mainly interact with each other but not with other groups) on a user interaction graph. We present a complete evaluation of this methodology, state of the art schemes and previous pooling models in terms of the cluster quality, document retrieval tasks performance and supervised machine learning classification score. Results show that our Community polling method outperformed other methods on the majority of metrics in two heterogeneous datasets, while also reducing the running time. This is useful when dealing with big amounts of noisy and short user-generated social media texts. Overall, our findings contribute to an improved methodology for identifying the latent topics in a Twitter dataset, without the need of modifying the basic machinery of a topic decomposition model.

READ FULL TEXT
research
06/15/2021

Author Clustering and Topic Estimation for Short Texts

Analysis of short text, such as social media posts, is extremely difficu...
research
03/03/2023

Topic Modeling Based on Two-Step Flow Theory: Application to Tweets about Bitcoin

Digital cryptocurrencies such as Bitcoin have exploded in recent years i...
research
03/17/2022

Short Text Topic Modeling: Application to tweets about Bitcoin

Understanding the semantic of a collection of texts is a challenging tas...
research
08/17/2022

BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic Consistency

Twitter bot detection is an important and meaningful task. Existing text...
research
09/09/2019

The Trumpiest Trump? Identifying a Subject's Most Characteristic Tweets

The sequence of documents produced by any given author varies in style a...
research
12/01/2019

JNET: Learning User Representations via Joint Network Embedding and Topic Embedding

User representation learning is vital to capture diverse user preference...
research
04/16/2021

Modeling Fuzzy Cluster Transitions for Topic Tracing

Twitter can be viewed as a data source for Natural Language Processing (...

Please sign up or login with your details

Forgot password? Click here to reset