Sparse Correspondence Analysis for Contingency Tables

12/08/2020
by   Ruiping Liu, et al.
0

Since the introduction of the lasso in regression, various sparse methods have been developed in an unsupervised context like sparse principal component analysis (s-PCA), sparse canonical correlation analysis (s-CCA) and sparse singular value decomposition (s-SVD). These sparse methods combine feature selection and dimension reduction. One advantage of s-PCA is to simplify the interpretation of the (pseudo) principal components since each one is expressed as a linear combination of a small number of variables. The disadvantages lie on the one hand in the difficulty of choosing the number of non-zero coefficients in the absence of a well established criterion and on the other hand in the loss of orthogonality for the components and/or the loadings. In this paper we propose sparse variants of correspondence analysis (CA)for large contingency tables like documents-terms matrices used in text mining, together with pPMD, a deation technique derived from projected deflation in s-PCA. We use the fact that CA is a double weighted PCA (for rows and columns) or a weighted SVD, as well as a canonical correlation analysis of indicator variables. Applying s-CCA or s-SVD allows to sparsify both rows and columns weights. The user may tune the level of sparsity of rows and columns and optimize it according to some criterium, and even decide that no sparsity is needed for rows (or columns) by relaxing one sparsity constraint. The latter is equivalent to apply s-PCA to matrices of row (or column) profiles.

READ FULL TEXT

page 23

page 25

page 27

research
07/01/2020

A New Basis for Sparse PCA

The statistical and computational performance of sparse principal compon...
research
02/21/2020

Sparse principal component regression via singular value decomposition approach

Principal component regression (PCR) is a two-stage procedure: the first...
research
09/02/2020

A Survey of Singular Value Decomposition Methods for Distributed Tall/Skinny Data

The Singular Value Decomposition (SVD) is one of the most important matr...
research
11/01/2010

CUR from a Sparse Optimization Viewpoint

The CUR decomposition provides an approximation of a matrix X that has l...
research
02/08/2023

Mallat Scattering Transformation based surrogate for MagnetoHydroDynamics

A Machine and Deep Learning methodology is developed and applied to give...
research
11/29/2022

Hierarchically Clustered PCA and CCA via a Convex Clustering Penalty

We introduce an unsupervised learning approach that combines the truncat...
research
07/25/2021

A Comparison of Latent Semantic Analysis and Correspondence Analysis for Text Mining

Both latent semantic analysis (LSA) and correspondence analysis (CA) use...

Please sign up or login with your details

Forgot password? Click here to reset