Handling Collocations in Hierarchical Latent Tree Analysis for Topic Modeling

07/10/2020
by   Leonard K. M. Poon, et al.
0

Topic modeling has been one of the most active research areas in machine learning in recent years. Hierarchical latent tree analysis (HLTA) has been recently proposed for hierarchical topic modeling and has shown superior performance over state-of-the-art methods. However, the models used in HLTA have a tree structure and cannot represent the different meanings of multiword expressions sharing the same word appropriately. Therefore, we propose a method for extracting and selecting collocations as a preprocessing step for HLTA. The selected collocations are replaced with single tokens in the bag-of-words model before running HLTA. Our empirical evaluation shows that the proposed method led to better performance of HLTA on three of the four data sets tested.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2015

Progressive EM for Latent Tree Models and Hierarchical Topic Detection

Hierarchical latent tree analysis (HLTA) is recently proposed as a new m...
research
12/12/2017

Document Generation with Hierarchical Latent Tree Models

In most probabilistic topic models, a document is viewed as a collection...
research
08/13/2016

Analysis of Morphology in Topic Modeling

Topic models make strong assumptions about their data. In particular, di...
research
04/11/2018

Learning Topics using Semantic Locality

The topic modeling discovers the latent topic probability of the given t...
research
08/29/2017

Unsupervised Terminological Ontology Learning based on Hierarchical Topic Modeling

In this paper, we present hierarchical relationbased latent Dirichlet al...
research
02/14/2012

Multidimensional counting grids: Inferring word order from disordered bags of words

Models of bags of words typically assume topic mixing so that the words ...
research
02/26/2019

Structure Tree-LSTM: Structure-aware Attentional Document Encoders

We propose a method to create document representations that reflect thei...

Please sign up or login with your details

Forgot password? Click here to reset