Correlation of Data Reconstruction Error and Shrinkages in Pair-wise Distances under Principal Component Analysis (PCA)

In this on-going work, I explore certain theoretical and empirical implications of data transformations under the PCA. In particular, I state and prove three theorems about PCA, which I paraphrase as follows: 1). PCA without discarding eigenvector rows is injective, but looses this injectivity when eigenvector rows are discarded 2). PCA without discarding eigen- vector rows preserves pair-wise distances, but tends to cause pair-wise distances to shrink when eigenvector rows are discarded. 3). For any pair of points, the shrinkage in pair-wise distance is bounded above by an L1 norm reconstruction error associated with the points. Clearly, 3). suggests that there might exist some correlation between shrinkages in pair-wise distances and mean square reconstruction error which is defined as the sum of those eigenvalues associated with the discarded eigenvectors. I therefore decided to perform numerical experiments to obtain the corre- lation between the sum of those eigenvalues and shrinkages in pair-wise distances. In addition, I have also performed some experiments to check respectively the effect of the sum of those eigenvalues and the effect of the shrinkages on classification accuracies under the PCA map. So far, I have obtained the following results on some publicly available data from the UCI Machine Learning Repository: 1). There seems to be a strong cor- relation between the sum of those eigenvalues associated with discarded eigenvectors and shrinkages in pair-wise distances. 2). Neither the sum of those eigenvalues nor pair-wise distances have any strong correlations with classification accuracies. 1

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2019

High-probability bounds for the reconstruction error of PCA

We identify principal component analysis (PCA) as an empirical risk mini...
research
06/02/2020

Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA)

We present a novel technique for sparse principal component analysis. Th...
research
03/14/2023

Informational Rescaling of PCA Maps with Application to Genetic Distance

We discuss the inadequacy of covariances/correlations and other measures...
research
07/28/2016

Asymptotic properties of Principal Component Analysis and shrinkage-bias adjustment under the Generalized Spiked Population model

With the development of high-throughput technologies, principal componen...
research
03/11/2013

Linear NDCG and Pair-wise Loss

Linear NDCG is used for measuring the performance of the Web content qua...
research
05/30/2020

An Analytical Formula for Spectrum Reconstruction

We study the spectrum reconstruction technique. As is known to all, eige...
research
10/14/2020

Measuring the originality of intellectual property assets based on machine learning outputs

Originality criteria are frequently used to assess the validity of intel...

Please sign up or login with your details

Forgot password? Click here to reset