Detecting approximate replicate components of a high-dimensional random vector with latent structure

10/05/2020
by   Xin Bing, et al.
0

High-dimensional feature vectors are likely to contain sets of measurements that are approximate replicates of one another. In complex applications, or automated data collection, these feature sets are not known a priori, and need to be determined. This work proposes a class of latent factor models on the observed high-dimensional random vector X ∈ℝ^p, for defining, identifying and estimating the index set of its approximately replicate components. The model class is parametrized by a p × K loading matrix A that contains a hidden sub-matrix whose rows can be partitioned into groups of parallel vectors. Under this model class, a set of approximate replicate components of X corresponds to a set of parallel rows in A: these entries of X are, up to scale and additive error, the same linear combination of the K latent factors; the value of K is itself unknown. The problem of finding approximate replicates in X reduces to identifying, and estimating, the location of the hidden sub-matrix within A, and of the partition of its row index set H. Both H and its partiton can be fully characterized in terms of a new family of criteria based on the correlation matrix of X, and their identifiability, as well as that of the unknown latent dimension K, are obtained as consequences. The constructive nature of the identifiability arguments enables computationally efficient procedures, with consistency guarantees. When A has the errors-in-variable parametrization, the difficulty of the problem is elevated. The task becomes that of separating out groups of parallel rows that are proportional to canonical basis vectors from other dense parallel rows in A. This is met under a scale assumption, via a principled way of selecting the target row indices, guided by the succesive maximization of Schur complements of appropriate covariance matrices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2020

Adaptive Estimation of Multivariate Regression with Hidden Variables

This paper studies the estimation of the coefficient matrix in multivar...
research
04/23/2017

Sparse Latent Factor Models with Pure Variables for Overlapping Clustering

The problem of overlapping variable clustering, ubiquitous in data scien...
research
11/02/2018

RSVP-graphs: Fast High-dimensional Covariance Matrix Estimation under Latent Confounding

In this work we consider the problem of estimating a high-dimensional p ...
research
10/27/2018

Groupcast Index Coding Problem: Joint Extensions

The groupcast index coding problem is the most general version of the cl...
research
09/29/2022

Modeling High-Dimensional Matrix-Variate Observations by Tensor Factorization

In the era of big data, it is prevailing of high-dimensional matrix-vari...
research
02/09/2015

High dimensional errors-in-variables models with dependent measurements

Suppose that we observe y ∈R^f and X ∈R^f × m in the following errors-in...
research
04/13/2017

Infinite Sparse Structured Factor Analysis

Matrix factorisation methods decompose multivariate observations as line...

Please sign up or login with your details

Forgot password? Click here to reset