Analysis of Morphology in Topic Modeling

08/13/2016
by   Chandler May, et al.
0

Topic models make strong assumptions about their data. In particular, different words are implicitly assumed to have different meanings: topic models are often used as human-interpretable dimensionality reductions and a proliferation of words with identical meanings would undermine the utility of the top-m word list representation of a topic. Though a number of authors have added preprocessing steps such as lemmatization to better accommodate these assumptions, the effects of such data massaging have not been publicly studied. We make first steps toward elucidating the role of morphology in topic modeling by testing the effect of lemmatization on the interpretability of a latent Dirichlet allocation (LDA) model. Using a word intrusion evaluation, we quantitatively demonstrate that lemmatization provides a significant benefit to the interpretability of a model learned on Wikipedia articles in a morphologically rich language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2014

Topic words analysis based on LDA model

Social network analysis (SNA), which is a research field describing and ...
research
03/30/2023

Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence

Extracting and identifying latent topics in large text corpora has gaine...
research
09/23/2020

Crosslingual Topic Modeling with WikiPDA

We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA), a cr...
research
07/10/2020

Handling Collocations in Hierarchical Latent Tree Analysis for Topic Modeling

Topic modeling has been one of the most active research areas in machine...
research
02/13/2023

Visualizing Topic Uncertainty in Topic Modelling

Word clouds became a standard tool for presenting results of natural lan...
research
11/30/2016

Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge

While generative models such as Latent Dirichlet Allocation (LDA) have p...
research
03/07/2023

How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding

While the successes of transformers across many domains are indisputable...

Please sign up or login with your details

Forgot password? Click here to reset