Biclustering Via Sparse Clustering

07/11/2014
by   Qian Liu, et al.
0

In many situations it is desirable to identify clusters that differ with respect to only a subset of features. Such clusters may represent homogeneous subgroups of patients with a disease, such as cancer or chronic pain. We define a bicluster to be a submatrix U of a larger data matrix X such that the features and observations in U differ from those not contained in U. For example, the observations in U could have different means or variances with respect to the features in U. We propose a general framework for biclustering based on the sparse clustering method of Witten and Tibshirani (2010). We develop a method for identifying features that belong to biclusters. This framework can be used to identify biclusters that differ with respect to the means of the features, the variance of the features, or more general differences. We apply these methods to several simulated and real-world data sets and compare the results of our method with several previously published methods. The results of our method compare favorably with existing methods with respect to both predictive accuracy and computing time.

READ FULL TEXT

page 18

page 19

page 21

page 24

research
01/29/2012

A robust and sparse K-means clustering algorithm

In many situations where the interest lies in identifying clusters one m...
research
07/01/2013

Semi-supervised clustering methods

Cluster analysis methods seek to partition a data set into homogeneous s...
research
12/23/2017

Merging K-means with hierarchical clustering for identifying general-shaped groups

Clustering partitions a dataset such that observations placed together i...
research
04/13/2013

Identification of relevant subtypes via preweighted sparse clustering

Cluster analysis methods are used to identify homogeneous subgroups in a...
research
02/23/2016

A Simple Approach to Sparse Clustering

Consider the problem of sparse clustering, where it is assumed that only...
research
04/04/2022

Multivariate Microaggregation of Set-Valued Data

Data controllers manage immense data, and occasionally, it is released p...
research
08/31/2021

Clustering of Pain Dynamics in Sickle Cell Disease from Sparse, Uneven Samples

Irregularly sampled time series data are common in a variety of fields. ...

Please sign up or login with your details

Forgot password? Click here to reset