Clustering High-dimensional Data via Feature Selection

10/27/2022
by   Tianqi Liu, et al.
0

High-dimensional clustering analysis is a challenging problem in statistics and machine learning, with broad applications such as the analysis of microarray data and RNA-seq data. In this paper, we propose a new clustering procedure called Spectral Clustering with Feature Selection (SC-FS), where we first obtain an initial estimate of labels via spectral clustering, then select a small fraction of features with the largest R-squared with these labels, i.e., the proportion of variation explained by group labels, and conduct clustering again using selected features. Under mild conditions, we prove that the proposed method identifies all informative features with high probability and achieves minimax optimal clustering error rate for the sparse Gaussian mixture model. Applications of SC-FS to four real world data sets demonstrate its usefulness in clustering high-dimensional data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2019

Optimality of Spectral Clustering for Gaussian Mixture Model

Spectral clustering is one of the most popular algorithms to group high ...
research
09/04/2019

Simultaneous Estimation of Number of Clusters and Feature Sparsity in Clustering High-Dimensional Data

Estimating the number of clusters (K) is a critical and often difficult ...
research
03/31/2014

Sparse K-Means with ℓ_∞/ℓ_0 Penalty for High-Dimensional Data Clustering

Sparse clustering, which aims to find a proper partition of an extremely...
research
08/28/2021

Feature Selection in High-dimensional Space Using Graph-Based Methods

High-dimensional feature selection is a central problem in a variety of ...
research
01/24/2019

Guarantees for Spectral Clustering with Fairness Constraints

Given the widespread popularity of spectral clustering (SC) for partitio...
research
07/22/2017

Sketched Subspace Clustering

The immense amount of daily generated and communicated data presents uni...
research
05/11/2023

Comparison of Clustering Algorithms for Statistical Features of Vibration Data Sets

Vibration-based condition monitoring systems are receiving increasing at...

Please sign up or login with your details

Forgot password? Click here to reset