Transfer Topic Modeling with Ease and Scalability

01/24/2013
by   Jeon-Hyung Kang, et al.
0

The increasing volume of short texts generated on social media sites, such as Twitter or Facebook, creates a great demand for effective and efficient topic modeling approaches. While latent Dirichlet allocation (LDA) can be applied, it is not optimal due to its weakness in handling short texts with fast-changing topics and scalability concerns. In this paper, we propose a transfer learning approach that utilizes abundant labeled documents from other domains (such as Yahoo! News or Wikipedia) to improve topic modeling, with better model fitting and result interpretation. Specifically, we develop Transfer Hierarchical LDA (thLDA) model, which incorporates the label information from other domains via informative priors. In addition, we develop a parallel implementation of our model for large-scale applications. We demonstrate the effectiveness of our thLDA model on both a microblogging dataset and standard text collections including AP and RCV1 datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2022

Experiments on Generalizability of BERTopic on Multi-Domain Short Text

Topic modeling is widely used for analytically evaluating large collecti...
research
08/04/2017

A network approach to topic models

One of the main computational and scientific challenges in the modern ag...
research
10/05/2018

Clust-LDA: Joint Model for Text Mining and Author Group Inference

Social media corpora pose unique challenges and opportunities, including...
research
11/07/2018

Transfer Learning from LDA to BiLSTM-CNN for Offensive Language Detection in Twitter

We investigate different strategies for automatic offensive language cla...
research
11/17/2013

Towards Big Topic Modeling

To solve the big topic modeling problem, we need to reduce both time and...
research
03/26/2020

Bag of biterms modeling for short texts

Analyzing texts from social media encounters many challenges due to thei...
research
07/31/2017

Familia: An Open-Source Toolkit for Industrial Topic Modeling

Familia is an open-source toolkit for pragmatic topic modeling in indust...

Please sign up or login with your details

Forgot password? Click here to reset