A Comparison of Latent Semantic Analysis and Correspondence Analysis for Text Mining

07/25/2021
by   Qianqian Qi, et al.
0

Both latent semantic analysis (LSA) and correspondence analysis (CA) use a singular value decomposition (SVD) for dimensionality reduction. In this article, LSA and CA are compared from a theoretical point of view and applied in both a toy example and an authorship attribution example. In text mining interest goes out to the relationships among documents and terms: for example, what terms are more often used in what documents. However, the LSA solution displays a mix of marginal effects and these relationships. It appears that CA has more attractive properties than LSA. One such property is that, in CA, the effect of the margins is effectively eliminated, so that the CA solution is optimally suited to focus on the relationships among documents and terms. Three mechanisms are distinguished to weight documents and terms, and a unifying framework is proposed that includes these three mechanisms and includes both CA and LSA as special cases. In the authorship attribution example, the national anthem of the Netherlands, the application of the discussed methods is illustrated.

READ FULL TEXT
research
03/14/2023

Improving information retrieval through correspondence analysis instead of latent semantic analysis

Both latent semantic analysis (LSA) and correspondence analysis (CA) are...
research
02/03/2022

Some notes on Goodman's marginal-free correspondence analysis

In his seminal paper Goodman (1996) introduced marginal-free corresponde...
research
09/14/2015

Geometry and dimensionality reduction of feature spaces in primary visual cortex

Some geometric properties of the wavelet analysis performed by visual ne...
research
12/08/2020

Sparse Correspondence Analysis for Contingency Tables

Since the introduction of the lasso in regression, various sparse method...
research
12/28/2017

Corpus specificity in LSA and Word2vec: the role of out-of-domain documents

Latent Semantic Analysis (LSA) and Word2vec are some of the most widely ...
research
01/23/2013

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis is a novel statistical technique ...

Please sign up or login with your details

Forgot password? Click here to reset