SimLDA: A tool for topic model evaluation

08/19/2022
by   Rebecca M. C. Taylor, et al.
0

Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In situations where marginalisation leads to non-conjugate messages, we use ideas from sampling to derive approximate update equations. In cases where conjugacy holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is used. Our algorithm, ALBU (approximate LBU), has strong similarities with Variational Message Passing (VMP) (which is the message passing variant of VB). To compare the performance of the algorithms in the presence of limited data, we use data sets consisting of tweets and news groups. Using coherence measures we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2021

ALBU: An approximate Loopy Belief message passing algorithm for LDA to improve performance on small data sets

Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has ...
research
11/02/2021

Variational message passing (VMP) applied to LDA

Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) is t...
research
04/01/2021

Bayesian Functional Principal Components Analysis via Variational Message Passing

Functional principal components analysis is a popular tool for inference...
research
06/11/2015

Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models

Topic models, and more specifically the class of Latent Dirichlet Alloca...
research
05/15/2014

Topic words analysis based on LDA model

Social network analysis (SNA), which is a research field describing and ...
research
05/01/2019

Nested Variational Autoencoder for Topic Modeling on Microtexts with Word Vectors

Most of the information on the Internet is represented in the form of mi...
research
07/08/2022

Twitmo: A Twitter Data Topic Modeling and Visualization Package for R

We present Twitmo, a package that provides a broad range of methods to c...

Please sign up or login with your details

Forgot password? Click here to reset