Data-Driven Tree Transforms and Metrics

08/18/2017
by   Gal Mishne, et al.
0

We consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organization and analysis. In this paper, our goal is to organize the data by defining an appropriate representation and metric such that they respect the smoothness and structure underlying the data. We also aim to generalize the joint clustering of observations and features in the case the data does not fall into clear disjoint groups. For this purpose, we propose multiscale data-driven transforms and metrics based on trees. Their construction is implemented in an iterative refinement procedure that exploits the co-dependencies between features and observations. Beyond the organization of a single dataset, our approach enables us to transfer the organization learned from one dataset to another and to integrate several datasets together. We present an application to breast cancer gene expression analysis: learning metrics on the genes to cluster the tumor samples into cancer sub-types and validating the joint organization of both the genes and the samples. We demonstrate that using our approach to combine information from multiple gene expression cohorts, acquired by different profiling technologies, improves the clustering of tumor samples.

READ FULL TEXT

page 5

page 6

page 11

research
02/24/2018

Correlating Cellular Features with Gene Expression using CCA

To understand the biology of cancer, joint analysis of multiple data mod...
research
04/17/2020

Identification of deregulated transcription factors involved in subtypes of cancers

We propose a methodology for the identification of transcription factors...
research
06/29/2015

Integrative analysis of gene expression and phenotype data

The linking genotype to phenotype is the fundamental aim of modern genet...
research
12/22/2019

Pooled variable scaling for cluster analysis

We propose a new approach for scaling prior to cluster analysis based on...
research
12/22/2019

Pooled scale estimators for scaling prior to cluster analysis

We propose a new approach for scaling prior to cluster analysis based on...
research
02/28/2013

Bayesian Consensus Clustering

The task of clustering a set of objects based on multiple sources of dat...
research
02/11/2015

Fast Embedding for JOFC Using the Raw Stress Criterion

The Joint Optimization of Fidelity and Commensurability (JOFC) manifold ...

Please sign up or login with your details

Forgot password? Click here to reset