Latent Tree Models for Hierarchical Topic Detection

05/21/2016
by   Peixian Chen, et al.
0

We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables, with those at the lowest latent level representing word co-occurrence patterns and those at higher levels representing co-occurrence of patterns at the level below. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. Unlike LDA-based topic models, HLTMs do not refer to a document generation process and use word variables instead of token variables. They use a tree structure to model the relationships between topics and words, which is conducive to the discovery of meaningful topics and topic hierarchies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2017

Document Generation with Hierarchical Latent Tree Models

In most probabilistic topic models, a document is viewed as a collection...
research
11/12/2021

On-the-Fly Rectification for Robust Large-Vocabulary Topic Inference

Across many data domains, co-occurrence statistics about the joint appea...
research
05/30/2018

Context Exploitation using Hierarchical Bayesian Models

We consider the problem of how to improve automatic target recognition b...
research
06/30/2022

A Rare Topic Discovery Model for Short Texts Based on Co-occurrence word Network

We provide a simple and general solution for the discovery of scarce top...
research
06/18/2020

Mapping the "long tail" of research funding: A topic analysis of NSF grant proposals in the Division of Astronomical Sciences

"Long tail" data are considered to be smaller, heterogeneous, researcher...
research
01/19/2021

Analysis and tuning of hierarchical topic models based on Renyi entropy approach

Hierarchical topic modeling is a potentially powerful instrument for det...
research
10/26/2022

ProSiT! Latent Variable Discovery with PROgressive SImilarity Thresholds

The most common ways to explore latent document dimensions are topic mod...

Please sign up or login with your details

Forgot password? Click here to reset