Are Clusterings of Multiple Data Views Independent?

01/12/2019
by   Lucy L. Gao, et al.
0

In the Pioneer 100 (P100) Wellness Project (Price and others, 2017), multiple types of data are collected on a single set of healthy participants at multiple timepoints in order to characterize and optimize wellness. One way to do this is to identify clusters, or subgroups, among the participants, and then to tailor personalized health recommendations to each subgroup. It is tempting to cluster the participants using all of the data types and timepoints, in order to fully exploit the available information. However, clustering the participants based on multiple data views implicitly assumes that a single underlying clustering of the participants is shared across all data views. If this assumption does not hold, then clustering the participants using multiple data views may lead to spurious results. In this paper, we seek to evaluate the assumption that there is some underlying relationship among the clusterings from the different data views, by asking the question: are the clusters within each data view dependent or independent? We develop a new test for answering this question, which we then apply to clinical, proteomic, and metabolomic data, across two distinct timepoints, from the P100 study. We find that while the subgroups of the participants defined with respect to any single data type seem to be dependent across time, the clustering among the participants based on one data type (e.g. proteomic data) appears not to be associated with the clustering based on another data type (e.g. clinical data).

READ FULL TEXT
research
02/23/2018

Behavioral-clinical phenotyping with type 2 diabetes self-monitoring data

Objective: To evaluate unsupervised clustering methods for identifying i...
research
11/26/2017

Visual Subpopulation Discovery and Validation in Cohort Study Data

Epidemiology aims at identifying subpopulations of cohort participants t...
research
07/09/2020

Supervised Robust Profile Clustering

In many studies, dimension reduction methods are used to profile partici...
research
09/25/2019

Testing for Association in Multi-View Network Data

In this paper, we consider data consisting of multiple networks, each co...
research
03/06/2018

Multiple Kernel k-means Clustering using Min-Max Optimization with l_2 Regularization

As various types of biomedical data become available, multiple kernel le...
research
11/17/2020

Defying the Circadian Rhythm: Clustering Participant Telemetry in the UK Biobank Data

The UK Biobank dataset follows over 500,000 volunteers and contains a di...
research
08/16/2016

Application of multiview techniques to NHANES dataset

Disease prediction or classification using health datasets involve using...

Please sign up or login with your details

Forgot password? Click here to reset