Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

02/21/2008
by   Benhuai Xie, et al.
0

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying clustering structures. Hence removing noise variables via variable selection is necessary. For simultaneous variable selection and parameter estimation, existing penalized likelihood approaches in model-based clustering analysis all assume a common diagonal covariance matrix across clusters, which however may not hold in practice. To analyze high-dimensional data, particularly those with relatively low sample sizes, this article introduces a novel approach that shrinks the variances together with means, in a more general situation with cluster-specific (diagonal) covariance matrices. Furthermore, selection of grouped variables via inclusion or exclusion of a group of variables altogether is permitted by a specific form of penalty, which facilitates incorporating subject-matter knowledge, such as gene functions in clustering microarray samples for disease subtype discovery. For implementation, EM algorithms are derived for parameter estimation, in which the M-steps clearly demonstrate the effects of shrinkage and thresholding. Numerical examples, including an application to acute leukemia subtype discovery with microarray gene expression data, are provided to demonstrate the utility and advantage of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2011

Slicing: Nonsingular Estimation of High Dimensional Covariance Matrices Using Multiway Kronecker Delta Covariance Structures

Nonsingular estimation of high dimensional covariance matrices is an imp...
research
01/02/2016

Joint Estimation of Precision Matrices in Heterogeneous Populations

We introduce a general framework for estimation of inverse covariance, o...
research
11/02/2012

APPLE: Approximate Path for Penalized Likelihood Estimators

In high-dimensional data analysis, penalized likelihood estimators are s...
research
09/10/2020

A Family of Mixture Models for Biclustering

Biclustering is used for simultaneous clustering of the observations and...
research
11/21/2017

Model-based Clustering with Sparse Covariance Matrices

Finite Gaussian mixture models are widely used for model-based clusterin...
research
05/30/2012

Finding Important Genes from High-Dimensional Data: An Appraisal of Statistical Tests and Machine-Learning Approaches

Over the past decades, statisticians and machine-learning researchers ha...
research
01/31/2020

A graph clustering approach to localization for adaptive covariance tuning in data assimilation based on state-observation mapping

An original graph clustering approach to efficient localization of error...

Please sign up or login with your details

Forgot password? Click here to reset