An eigenanalysis of data centering in machine learning

07/10/2014
by   Paul Honeine, et al.
0

Many pattern recognition methods rely on statistical information from centered data, with the eigenanalysis of an empirical central moment, such as the covariance matrix in principal component analysis (PCA), as well as partial least squares regression, canonical-correlation analysis and Fisher discriminant analysis. Recently, many researchers advocate working on non-centered data. This is the case for instance with the singular value decomposition approach, with the (kernel) entropy component analysis, with the information-theoretic learning framework, and even with nonnegative matrix factorization. Moreover, one can also consider a non-centered PCA by using the second-order non-central moment. The main purpose of this paper is to bridge the gap between these two viewpoints in designing machine learning methods. To provide a study at the cornerstone of kernel-based machines, we conduct an eigenanalysis of the inner product matrices from centered and non-centered data. We derive several results connecting their eigenvalues and their eigenvectors. Furthermore, we explore the outer product matrices, by providing several results connecting the largest eigenvectors of the covariance matrix and its non-centered counterpart. These results lay the groundwork to several extensions beyond conventional centering, with the weighted mean shift, the rank-one update, and the multidimensional scaling. Experiments conducted on simulated and real data illustrate the relevance of this work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2022

Covariance matrix preparation for quantum principal component analysis

Principal component analysis (PCA) is a dimensionality reduction method ...
research
05/31/2019

Diagonally-Dominant Principal Component Analysis

We consider the problem of decomposing a large covariance matrix into th...
research
02/17/2017

How close are the eigenvectors and eigenvalues of the sample and actual covariance matrices?

How many samples are sufficient to guarantee that the eigenvectors and e...
research
05/15/2011

Spectrum Sensing for Cognitive Radio Using Kernel-Based Learning

Kernel method is a very powerful tool in machine learning. The trick of ...
research
05/31/2022

coVariance Neural Networks

Graph neural networks (GNN) are an effective framework that exploit inte...
research
02/11/2018

Multi-set Canonical Correlation Analysis simply explained

There are a multitude of methods to perform multi-set correlated compone...
research
07/16/2012

Designing various component analysis at will

This paper provides a generic framework of component analysis (CA) metho...

Please sign up or login with your details

Forgot password? Click here to reset