Sharp-SSL: Selective high-dimensional axis-aligned random projections for semi-supervised learning

04/18/2023
by   Tengyao Wang, et al.
0

We propose a new method for high-dimensional semi-supervised learning problems based on the careful aggregation of the results of a low-dimensional procedure applied to many axis-aligned random projections of the data. Our primary goal is to identify important variables for distinguishing between the classes; existing low-dimensional methods can then be applied for final class assignment. Motivated by a generalized Rayleigh quotient, we score projections according to the traces of the estimated whitened between-class covariance matrices on the projected data. This enables us to assign an importance weight to each variable for a given projection, and to select our signal variables by aggregating these weights over high-scoring projections. Our theory shows that the resulting Sharp-SSL algorithm is able to recover the signal coordinates with high probability when we aggregate over sufficiently many random projections and when the base procedure estimates the whitened between-class covariance matrix sufficiently well. The Gaussian EM algorithm is a natural choice as a base procedure, and we provide a new analysis of its performance in semi-supervised settings that controls the parameter estimation error in terms of the proportion of labeled data in the sample. Numerical results on both simulated data and a real colon tumor dataset support the excellent empirical performance of the method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2019

High-dimensional clustering via Random Projections

In this work, we address the unsupervised classification issue by exploi...
research
10/03/2016

Sequential Low-Rank Change Detection

Detecting emergence of a low-rank signal from high-dimensional data is a...
research
05/18/2020

High-dimensional outlier detection using random projections

There exist multiple methods to detect outliers in multivariate data in ...
research
12/19/2017

Exploring High-Dimensional Structure via Axis-Aligned Decomposition of Linear Projections

Two-dimensional embeddings remain the dominant approach to visualize hig...
research
06/06/2013

Multiclass Semi-Supervised Learning on Graphs using Ginzburg-Landau Functional Minimization

We present a graph-based variational algorithm for classification of hig...
research
06/03/2021

Privately Learning Mixtures of Axis-Aligned Gaussians

We consider the problem of learning mixtures of Gaussians under the cons...
research
07/29/2021

CAD: Debiasing the Lasso with inaccurate covariate model

We consider the problem of estimating a low-dimensional parameter in hig...

Please sign up or login with your details

Forgot password? Click here to reset