Topic Modeling Using Distributed Word Embeddings

03/15/2016
by   Ramandeep S Randhawa, et al.
0

We propose a new algorithm for topic modeling, Vec2Topic, that identifies the main topics in a corpus using semantic information captured via high-dimensional distributed word embeddings. Our technique is unsupervised and generates a list of topics ranked with respect to importance. We find that it works better than existing topic modeling techniques such as Latent Dirichlet Allocation for identifying key topics in user-generated content, such as emails, chats, etc., where topics are diffused across the corpus. We also find that Vec2Topic works equally well for non-user generated content, such as papers, reports, etc., and for small corpora such as a single-document.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2017

Jointly Learning Word Embeddings and Latent Topics

Word embedding models such as Skip-gram learn a vector-space representat...
research
07/10/2020

Topic Modeling on User Stories using Word Mover's Distance

Requirements elicitation has recently been complemented with crowd-based...
research
08/11/2016

Sex, drugs, and violence

Automatically detecting inappropriate content can be a difficult NLP tas...
research
11/24/2017

Semantic Map of Sexism: Topic Modelling of Everyday Sexism Project Entries

The Everyday Sexism Project documents everyday examples of sexism report...
research
05/01/2016

Text-mining the NeuroSynth corpus using Deep Boltzmann Machines

Large-scale automated meta-analysis of neuroimaging data has recently es...
research
02/06/2020

Intelligent Arxiv: Sort daily papers by learning users topics preference

Current daily paper releases are becoming increasingly large and areas o...
research
07/22/2020

Better Early than Late: Fusing Topics with Word Embeddings for Neural Question Paraphrase Identification

Question paraphrase identification is a key task in Community Question A...

Please sign up or login with your details

Forgot password? Click here to reset