LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations

02/23/2017
by   Jarvan Law, et al.
0

Topic models have been widely used in discovering latent topics which are shared across documents in text mining. Vector representations, word embeddings and topic embeddings, map words and topics into a low-dimensional and dense real-value vector space, which have obtained high performance in NLP tasks. However, most of the existing models assume the result trained by one of them are perfect correct and used as prior knowledge for improving the other model. Some other models use the information trained from external large corpus to help improving smaller corpus. In this paper, we aim to build such an algorithm framework that makes topic models and vector representations mutually improve each other within the same corpus. An EM-style algorithm framework is employed to iteratively optimize both topic model and vector representations. Experimental results show that our model outperforms state-of-art methods on various NLP tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2018

Improving Topic Models with Latent Feature Word Representations

Probabilistic topic models are widely used to discover latent topics in ...
research
01/27/2018

Improving Word Vector with Prior Knowledge in Semantic Dictionary

Using low dimensional vector space to represent words has been very effe...
research
09/10/2019

Neural Embedding Allocation: Distributed Representations of Topic Models

Word embedding models such as the skip-gram learn vector representations...
research
02/03/2016

"Draw My Topics": Find Desired Topics fast from large scale of Corpus

We develop the "Draw My Topics" toolkit, which provides a fast way to in...
research
11/26/2022

Searching for Discriminative Words in Multidimensional Continuous Feature Space

Word feature vectors have been proven to improve many NLP tasks. With re...
research
05/20/2017

Mixed Membership Word Embeddings for Computational Social Science

Word embeddings improve the performance of NLP systems by revealing the ...
research
01/07/2019

Vector representations of text data in deep learning

In this dissertation we report results of our research on dense distribu...

Please sign up or login with your details

Forgot password? Click here to reset