Improving Topic Models with Latent Feature Word Representations

10/15/2018
by   Dat Quoc Nguyen, et al.
0

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2017

LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations

Topic models have been widely used in discovering latent topics which ar...
research
05/01/2017

Learning Topic-Sensitive Word Representations

Distributed word representations are widely used for modeling words in N...
research
03/30/2023

Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence

Extracting and identifying latent topics in large text corpora has gaine...
research
10/26/2022

ProSiT! Latent Variable Discovery with PROgressive SImilarity Thresholds

The most common ways to explore latent document dimensions are topic mod...
research
12/20/2016

SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations

We present a feature vector formation technique for documents - Sparse C...
research
08/05/2020

BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and Segmentation

Existing topic modeling and text segmentation methodologies generally re...
research
02/22/2018

Learning Topic Models by Neighborhood Aggregation

Topic models are one of the most frequently used models in machine learn...

Please sign up or login with your details

Forgot password? Click here to reset