Probit Normal Correlated Topic Models

10/03/2014
by   Xingchen Yu, et al.
0

The logistic normal distribution has recently been adapted via the transformation of multivariate Gaus- sian variables to model the topical distribution of documents in the presence of correlations among topics. In this paper, we propose a probit normal alternative approach to modelling correlated topical structures. Our use of the probit model in the context of topic discovery is novel, as many authors have so far con- centrated solely of the logistic model partly due to the formidable inefficiency of the multinomial probit model even in the case of very small topical spaces. We herein circumvent the inefficiency of multinomial probit estimation by using an adaptation of the diagonal orthant multinomial probit in the topic models context, resulting in the ability of our topic modelling scheme to handle corpuses with a large number of latent topics. An additional and very important benefit of our method lies in the fact that unlike with the logistic normal model whose non-conjugacy leads to the need for sophisticated sampling schemes, our ap- proach exploits the natural conjugacy inherent in the auxiliary formulation of the probit model to achieve greater simplicity. The application of our proposed scheme to a well known Associated Press corpus not only helps discover a large number of meaningful topics but also reveals the capturing of compellingly intuitive correlations among certain topics. Besides, our proposed approach lends itself to even further scalability thanks to various existing high performance algorithms and architectures capable of handling millions of documents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2020

Topic Extraction of Crawled Documents Collection using Correlated Topic Model in MapReduce Framework

The tremendous increase in the amount of available research documents im...
research
11/02/2018

Dirichlet belief networks for topic structure learning

Recently, considerable research effort has been devoted to developing de...
research
07/13/2021

Semiparametric Latent Topic Modeling on Consumer-Generated Corpora

Legacy procedures for topic modelling have generally suffered problems o...
research
11/22/2021

HTMOT : Hierarchical Topic Modelling Over Time

Over the years, topic models have provided an efficient way of extractin...
research
01/26/2023

Neural Dynamic Focused Topic Model

Topic models and all their variants analyse text by learning meaningful ...
research
07/29/2016

TopicResponse: A Marriage of Topic Modelling and Rasch Modelling for Automatic Measurement in MOOCs

This paper explores the suitability of using automatically discovered to...
research
08/25/2023

Measuring Spurious Correlation in Classification: 'Clever Hans' in Translationese

Recent work has shown evidence of 'Clever Hans' behavior in high-perform...

Please sign up or login with your details

Forgot password? Click here to reset