A Scalable Approach to Estimating the Rank of High-Dimensional Data

07/30/2021
by   Wenlan Zang, et al.
0

A key challenge to performing effective analyses of high-dimensional data is finding a signal-rich, low-dimensional representation. For linear subspaces, this is generally performed by decomposing a design matrix (via eigenvalue or singular value decomposition) into orthogonal components, and then retaining those components with sufficient variations. This is equivalent to estimating the rank of the matrix and deciding which components to retain is generally carried out using heuristic or ad-hoc approaches such as plotting the decreasing sequence of the eigenvalues and looking for the "elbow" in the plot. While these approaches have been shown to be effective, a poorly calibrated or misjudged elbow location can result in an overabundance of noise or an under-abundance of signal in the low-dimensional representation, making subsequent modeling difficult. In this article, we propose a latent-space-construction procedure to estimate the rank of the detectable signal space of a matrix by retaining components whose variations are significantly greater than random matrices, of which eigenvalues follow a universal Marchĕnko-Pastur (MP) distribution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2019

Hybrid Kronecker Product Decomposition and Approximation

Discovering the underlying low dimensional structure of high dimensional...
research
09/25/2020

Tracy-Widom law for the extreme eigenvalues of large signal-plus-noise matrices

Let =+ be an M× N matrix, where is a rectangular diagonal matrix and c...
research
10/25/2021

Fast estimation method for rank of a high-dimensional sparse matrix

Numerical computing the rank of a matrix is a fundamental problem in sci...
research
11/20/2019

On Universal Features for High-Dimensional Learning and Inference

We consider the problem of identifying universal low-dimensional feature...
research
05/24/2018

Confidence interval of singular vectors for high-dimensional and low-rank matrix regression

Let M∈R^m_1× m_2 be an unknown matrix with r= rank( M)≪(m_1,m_2) whose ...
research
11/08/2021

Statistical properties of large data sets with linear latent features

Analytical understanding of how low-dimensional latent features reveal t...
research
12/01/2014

Classification and Reconstruction of High-Dimensional Signals from Low-Dimensional Features in the Presence of Side Information

This paper offers a characterization of fundamental limits on the classi...

Please sign up or login with your details

Forgot password? Click here to reset