Short Text Topic Modeling Techniques, Applications, and Performance: A Survey

04/13/2019
by   Qiang Jipeng, et al.
0

Analyzing short texts infers discriminative and coherent latent topics that is a critical and fundamental task since many real-world applications require semantic understanding of short texts. Traditional long text topic modeling algorithms (e.g., PLSA and LDA) based on word co-occurrences cannot solve this problem very well since only very limited word co-occurrence information is available in short texts. Therefore, short text topic modeling has already attracted much attention from the machine learning research community in recent years, which aims at overcoming the problem of sparseness in short texts. In this survey, we conduct a comprehensive review of various short text topic modeling techniques proposed in the literature. We present three categories of methods based on Dirichlet multinomial mixture, global word co-occurrences, and self-aggregation, with example of representative approaches in each category and analysis of their performance on various tasks. We develop the first comprehensive open-source library, called STTM, for use in Java that integrates all surveyed algorithms within a unified interface, benchmark datasets, to facilitate the expansion of new methods in this research field. Finally, we evaluate these state-of-the-art methods on many real-world datasets and compare their performance against one another and versus long text topic modeling algorithm.

READ FULL TEXT

page 12

page 15

research
08/07/2018

STTM: A Tool for Short Text Topic Modeling

Along with the emergence and popularity of social communications on the ...
research
08/30/2017

End-to-end Learning for Short Text Expansion

Effectively making sense of short texts is a critical task for many real...
research
12/17/2014

Word Network Topic Model: A Simple but General Solution for Short and Imbalanced Texts

The short text has been the prevalent format for information of Internet...
research
08/02/2022

No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling

Extracting knowledge from unlabeled texts using machine learning algorit...
research
03/26/2020

Bag of biterms modeling for short texts

Analyzing texts from social media encounters many challenges due to thei...
research
03/26/2021

An Embedding-based Joint Sentiment-Topic Model for Short Texts

Short text is a popular avenue of sharing feedback, opinions and reviews...
research
05/01/2017

Stochastic Divergence Minimization for Biterm Topic Model

As the emergence and the thriving development of social networks, a huge...

Please sign up or login with your details

Forgot password? Click here to reset