Document Generation with Hierarchical Latent Tree Models

12/12/2017
by   Peixian Chen, et al.
0

In most probabilistic topic models, a document is viewed as a collection of tokens and each token is a variable whose values are all the words in a vocabulary. One exception is hierarchical latent tree models (HLTMs), where a document is viewed as a binary vector over the vocabulary and each word is regarded as a binary variable. The use of word variables allows the detection and representation of patterns of word co-occurrences and co-occurrences of those patterns qualitatively using multiple levels of latent variables, and naturally leads to a method for hierarchical topic detection. In this paper, we assume that an HLTM has been learned from binary data and we extend it to take word frequencies into consideration. The idea is to replace each binary word variable with a real-valued variable that represents the relative frequency of the word in a document. A document generation process is proposed and an algorithm is given for estimating the model parameters by inverting the generation process. Empirical results show that our method significantly outperforms the commonly-used LDA-based methods for hierarchical topic detection, in terms of model quality and meaningfulness of topics and topic hierarchies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2016

Latent Tree Models for Hierarchical Topic Detection

We present a novel method for hierarchical topic detection where topics ...
research
08/05/2015

Progressive EM for Latent Tree Models and Hierarchical Topic Detection

Hierarchical latent tree analysis (HLTA) is recently proposed as a new m...
research
07/10/2020

Handling Collocations in Hierarchical Latent Tree Analysis for Topic Modeling

Topic modeling has been one of the most active research areas in machine...
research
12/29/2013

Probabilistic Archetypal Analysis

Archetypal analysis represents a set of observations as convex combinati...
research
08/12/2020

Neural Sinkhorn Topic Model

In this paper, we present a new topic modelling approach via the theory ...
research
10/23/2020

Topic Modeling with Contextualized Word Representation Clusters

Clustering token-level contextualized word representations produces outp...
research
10/26/2022

ProSiT! Latent Variable Discovery with PROgressive SImilarity Thresholds

The most common ways to explore latent document dimensions are topic mod...

Please sign up or login with your details

Forgot password? Click here to reset