Topic Modelling of Empirical Text Corpora: Validity, Reliability, and Reproducibility in Comparison to Semantic Maps

06/04/2018
by   Tobias Hecking, et al.
0

Using the 6,638 case descriptions of societal impact submitted for evaluation in the Research Excellence Framework (REF 2014), we replicate the topic model (Latent Dirichlet Allocation or LDA) made in this context and compare the results with factor-analytic results using a traditional word-document matrix (Principal Component Analysis or PCA). Removing a small fraction of documents from the sample, for example, has on average a much larger impact on LDA than on PCA-based models to the extent that the largest distortion in the case of PCA has less effect than the smallest distortion of LDA-based models. In terms of semantic coherence, however, LDA models outperform PCA-based models. The topic models inform us about the statistical properties of the document sets under study, but the results are statistical and should not be used for a semantic interpretation - for example, in grant selections and micro-decision making, or scholarly work-without follow-up using domain-specific semantic maps.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2021

n-stage Latent Dirichlet Allocation: A Novel Approach for LDA

Nowadays, data analysis has become a problem as the amount of data is co...
research
07/23/2022

A Data-driven Latent Semantic Analysis for Automatic Text Summarization using LDA Topic Modelling

With the advent and popularity of big data mining and huge text analysis...
research
10/12/2021

Topic Model Supervised by Understanding Map

Inspired by the notion of Center of Mass in physics, an extension called...
research
02/06/2019

Principal Model Analysis Based on Partial Least Squares

Motivated by the Bagging Partial Least Squares (PLS) and Principal Compo...
research
11/30/2021

Bilingual Topic Models for Comparable Corpora

Probabilistic topic models like Latent Dirichlet Allocation (LDA) have b...
research
07/14/2021

Comparison of Canonical Correlation and Partial Least Squares analyses of simulated and empirical data

In this paper, we compared the general forms of CCA and PLS on three sim...
research
10/12/2015

Towards Meaningful Maps of Polish Case Law

In this work, we analyze the utility of two dimensional document maps fo...

Please sign up or login with your details

Forgot password? Click here to reset