Sparse K-Means with ℓ_∞/ℓ_0 Penalty for High-Dimensional Data Clustering

03/31/2014
by   Xiangyu Chang, et al.
0

Sparse clustering, which aims to find a proper partition of an extremely high-dimensional data set with redundant noise features, has been attracted more and more interests in recent years. The existing studies commonly solve the problem in a framework of maximizing the weighted feature contributions subject to a ℓ_2/ℓ_1 penalty. Nevertheless, this framework has two serious drawbacks: One is that the solution of the framework unavoidably involves a considerable portion of redundant noise features in many situations, and the other is that the framework neither offers intuitive explanations on why this framework can select relevant features nor leads to any theoretical guarantee for feature selection consistency. In this article, we attempt to overcome those drawbacks through developing a new sparse clustering framework which uses a ℓ_∞/ℓ_0 penalty. First, we introduce new concepts on optimal partitions and noise features for the high-dimensional data clustering problems, based on which the previously known framework can be intuitively explained in principle. Then, we apply the suggested ℓ_∞/ℓ_0 framework to formulate a new sparse k-means model with the ℓ_∞/ℓ_0 penalty (ℓ_0-k-means for short). We propose an efficient iterative algorithm for solving the ℓ_0-k-means. To deeply understand the behavior of ℓ_0-k-means, we prove that the solution yielded by the ℓ_0-k-means algorithm has feature selection consistency whenever the data matrix is generated from a high-dimensional Gaussian mixture model. Finally, we provide experiments with both synthetic data and the Allen Developing Mouse Brain Atlas data to support that the proposed ℓ_0-k-means exhibits better noise feature detection capacity over the previously known sparse k-means with the ℓ_2/ℓ_1 penalty (ℓ_1-k-means for short).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2019

A Strongly Consistent Sparse k-means Clustering with Direct l_1 Penalization on Variable Weights

We propose the Lasso Weighted k-means (LW-k-means) algorithm as a simple...
research
10/27/2022

Clustering High-dimensional Data via Feature Selection

High-dimensional clustering analysis is a challenging problem in statist...
research
02/07/2023

Sparse GEMINI for Joint Discriminative Clustering and Feature Selection

Feature selection in clustering is a hard task which involves simultaneo...
research
03/30/2021

A General Framework of Nonparametric Feature Selection in High-Dimensional Data

Nonparametric feature selection in high-dimensional data is an important...
research
02/20/2020

A Scalable Framework for Sparse Clustering Without Shrinkage

Clustering, a fundamental activity in unsupervised learning, is notoriou...
research
02/16/2022

Using the left Gram matrix to cluster high dimensional data

For high dimensional data, where P features for N objects (P >> N) are r...
research
03/21/2018

Clustering to Reduce Spatial Data Set Size

Traditionally it had been a problem that researchers did not have access...

Please sign up or login with your details

Forgot password? Click here to reset