Hybrid Subspace Learning for High-Dimensional Data

08/05/2018
by   Micol Marchetti-Bowick, et al.
0

The high-dimensional data setting, in which p >> n, is a challenging statistical paradigm that appears in many real-world problems. In this setting, learning a compact, low-dimensional representation of the data can substantially help distinguish signal from noise. One way to achieve this goal is to perform subspace learning to estimate a small set of latent features that capture the majority of the variance in the original data. Most existing subspace learning models, such as PCA, assume that the data can be fully represented by its embedding in one or more latent subspaces. However, in this work, we argue that this assumption is not suitable for many high-dimensional datasets; often only some variables can easily be projected to a low-dimensional space. We propose a hybrid dimensionality reduction technique in which some features are mapped to a low-dimensional subspace while others remain in the original space. Our model leads to more accurate estimation of the latent space and lower reconstruction error. We present a simple optimization procedure for the resulting biconvex problem and show synthetic data results that demonstrate the advantages of our approach over existing methods. Finally, we demonstrate the effectiveness of this method for extracting meaningful features from both gene expression and video background subtraction datasets.

READ FULL TEXT
research
11/02/2022

Linear Embedding-based High-dimensional Batch Bayesian Optimization without Reconstruction Mappings

The optimization of high-dimensional black-box functions is a challengin...
research
02/29/2020

Determination of Latent Dimensionality in International Trade Flow

Currently, high-dimensional data is ubiquitous in data science, which ne...
research
07/16/2021

ScRAE: Deterministic Regularized Autoencoders with Flexible Priors for Clustering Single-cell Gene Expression Data

Clustering single-cell RNA sequence (scRNA-seq) data poses statistical a...
research
12/08/2022

SpaceEditing: Integrating Human Knowledge into Deep Neural Networks via Interactive Latent Space Editing

We propose an interactive editing method that allows humans to help deep...
research
03/27/2023

Cross-study analyses of microbial abundance using generalized common factor methods

By creating networks of biochemical pathways, communities of micro-organ...
research
07/08/2020

Linear Tensor Projection Revealing Nonlinearity

Dimensionality reduction is an effective method for learning high-dimens...
research
08/15/2018

Joint & Progressive Learning from High-Dimensional Data for Multi-Label Classification

Despite the fact that nonlinear subspace learning techniques (e.g. manif...

Please sign up or login with your details

Forgot password? Click here to reset