SNeCT: Scalable network constrained Tucker decomposition for integrative multi-platform data analysis

11/22/2017
by   Dongjin Choi, et al.
0

Motivation: How do we integratively analyze large-scale multi-platform genomic data that are high dimensional and sparse? Furthermore, how can we incorporate prior knowledge, such as the association between genes, in the analysis systematically? Method: To solve this problem, we propose a Scalable Network Constrained Tucker decomposition method we call SNeCT. SNeCT adopts parallel stochastic gradient descent approach on the proposed parallelizable network constrained optimization function. SNeCT decomposition is applied to tensor constructed from large scale multi-platform multi-cohort cancer data, PanCan12, constrained on a network built from PathwayCommons database. Results: The decomposed factor matrices are applied to stratify cancers, to search for top-k similar patients, and to illustrate how the matrices can be used for personalized interpretation. In the stratification test, combined twelve-cohort data is clustered to form thirteen subclasses. The thirteen subclasses have a high correlation to tissue of origin in addition to other interesting observations, such as clear separation of OV cancers to two groups, and high clinical correlation within subclusters formed in cohorts BRCA and UCEC. In the top-k search, a new patient's genomic profile is generated and searched against existing patients based on the factor matrices. The similarity of the top-k patient to the query is high for 23 clinical features, including estrogen/progesterone receptor statuses of BRCA patients with average precision value ranges from 0.72 to 0.86 and from 0.68 to 0.86, respectively. We also provide an illustration of how the factor matrices can be used for interpretable personalized analysis of each patient.

READ FULL TEXT
research
10/12/2022

cuFasterTucker: A Stochastic Optimization Strategy for Parallel Sparse FastTucker Decomposition on GPU Platform

Currently, the size of scientific data is growing at an unprecedented ra...
research
01/09/2018

GIFT: Guided and Interpretable Factorization for Tensors - An Application to Large-Scale Multi-platform Cancer Analysis

Given multi-platform genome data with prior knowledge of functional gene...
research
03/13/2017

SPARTan: Scalable PARAFAC2 for Large & Sparse Data

In exploratory tensor mining, a common problem is how to analyze a set o...
research
08/26/2023

Large-scale gradient-based training of Mixtures of Factor Analyzers

Gaussian Mixture Models (GMMs) are a standard tool in data analysis. How...
research
06/23/2023

Multi-objective optimization based network control principles for identifying personalized drug targets with cancer

It is a big challenge to develop efficient models for identifying person...
research
02/20/2018

AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning

Clinical prognostic models derived from largescale healthcare data can i...

Please sign up or login with your details

Forgot password? Click here to reset