Unlabeled Principal Component Analysis

01/23/2021
by   Yunzhen Yao, et al.
0

We consider the problem of principal component analysis from a data matrix where the entries of each column have undergone some unknown permutation, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that for generic enough data, and up to a permutation of the coordinates of the ambient space, there is a unique subspace of minimal dimension that explains the data. We show that a permutation-invariant system of polynomial equations has finitely many solutions, with each solution corresponding to a row permutation of the ground-truth data matrix. Allowing for missing entries on top of permutations leads to the problem of unlabeled matrix completion, for which we give theoretical results of similar flavor. We also propose a two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data has been permuted. Stage-I of this pipeline employs robust-PCA methods to estimate the ground-truth column-space. Equipped with the column-space, stage-II applies methods for linear regression without correspondences to restore the permuted data. A computational study reveals encouraging findings, including the ability of UPCA to handle face images from the Extended Yale-B database with arbitrarily permuted patches of arbitrary size in 0.3 seconds on a standard desktop computer.

READ FULL TEXT

page 11

page 14

page 16

research
12/31/2018

The Stochastic Complexity of Principal Component Analysis

PCA (principal component analysis) and its variants are ubiquitous techn...
research
06/28/2019

High-dimensional principal component analysis with heterogeneous missingness

We study the problem of high-dimensional Principal Component Analysis (P...
research
06/07/2023

Yet Another Algorithm for Supervised Principal Component Analysis: Supervised Linear Centroid-Encoder

We propose a new supervised dimensionality reduction technique called Su...
research
07/14/2020

Predicting feature imputability in the absence of ground truth

Data imputation is the most popular method of dealing with missing value...
research
10/12/2018

An Algebraic-Geometric Approach to Shuffled Linear Regression

Shuffled linear regression is the problem of performing a linear regress...
research
12/19/2019

Advanced Variations of Two-Dimensional Principal Component Analysis for Face Recognition

The two-dimensional principal component analysis (2DPCA) has become one ...
research
12/03/2018

Permutations Unlabeled beyond Sampling Unknown

A recent result on unlabeled sampling states that with probability one o...

Please sign up or login with your details

Forgot password? Click here to reset