Pooled variable scaling for cluster analysis

12/22/2019
by   Jakob Raymaekers, et al.
0

We propose a new approach for scaling prior to cluster analysis based on the concept of pooled variance. Unlike available scaling procedures such as the standard deviation and the range, our proposed scale avoids dampening the beneficial effect of informative clustering variables. We confirm through an extensive simulation study and applications to well known real data examples that the proposed scaling method is safe and generally useful. Finally, we use our approach to cluster a high dimensional genomic dataset consisting of gene expression data for several specimens of breast cancer cells tissue.

READ FULL TEXT
research
12/22/2019

Pooled scale estimators for scaling prior to cluster analysis

We propose a new approach for scaling prior to cluster analysis based on...
research
02/08/2022

Adaptive Bayesian Variable Clustering via Structural Learning of Breast Cancer Data

Clustering of proteins is of interest in cancer cell biology. This artic...
research
05/17/2022

Shape complexity in cluster analysis

In cluster analysis, a common first step is to scale the data aiming to ...
research
06/26/2018

Bayesian Multi-study Factor Analysis for High-throughput Biological Data

This paper presents a new modeling strategy for joint unsupervised analy...
research
08/18/2017

Data-Driven Tree Transforms and Metrics

We consider the analysis of high dimensional data given in the form of a...
research
02/22/2016

An Effective and Efficient Approach for Clusterability Evaluation

Clustering is an essential data mining tool that aims to discover inhere...
research
10/02/2020

Regularized K-means through hard-thresholding

We study a framework of regularized K-means methods based on direct pena...

Please sign up or login with your details

Forgot password? Click here to reset