Integrative analysis of gene expression and phenotype data

06/29/2015
by   Min Xu, et al.
0

The linking genotype to phenotype is the fundamental aim of modern genetics. We focus on study of links between gene expression data and phenotype data through integrative analysis. We propose three approaches. 1) The inherent complexity of phenotypes makes high-throughput phenotype profiling a very difficult and laborious process. We propose a method of automated multi-dimensional profiling which uses gene expression similarity. Large-scale analysis show that our method can provide robust profiling that reveals different phenotypic aspects of samples. This profiling technique is also capable of interpolation and extrapolation beyond the phenotype information given in training data. It can be used in many applications, including facilitating experimental design and detecting confounding factors. 2) Phenotype association analysis problems are complicated by small sample size and high dimensionality. Consequently, phenotype-associated gene subsets obtained from training data are very sensitive to selection of training samples, and the constructed sample phenotype classifiers tend to have poor generalization properties. To eliminate these obstacles, we propose a novel approach that generates sequences of increasingly discriminative gene cluster combinations. Our experiments on both simulated and real datasets show robust and accurate classification performance. 3) Many complex phenotypes, such as cancer, are the product of not only gene expression, but also gene interaction. We propose an integrative approach to find gene network modules that activate under different phenotype conditions. Using our method, we discovered cancer subtype-specific network modules, as well as the ways in which these modules coordinate. In particular, we detected a breast-cancer specific tumor suppressor network module with a hub gene, PDGFRL, which may play an important role in this module.

READ FULL TEXT

page 14

page 23

research
04/17/2020

Identification of deregulated transcription factors involved in subtypes of cancers

We propose a methodology for the identification of transcription factors...
research
02/28/2016

Stability and Structural Properties of Gene Regulation Networks with Coregulation Rules

Coregulation of the expression of groups of genes has been extensively d...
research
06/26/2018

Bayesian Multi-study Factor Analysis for High-throughput Biological Data

This paper presents a new modeling strategy for joint unsupervised analy...
research
08/18/2017

Data-Driven Tree Transforms and Metrics

We consider the analysis of high dimensional data given in the form of a...
research
08/29/2023

From RNA sequencing measurements to the final results: a practical guide to navigating the choices and uncertainties of gene set analysis

Gene set analysis, a popular approach for analyzing high-throughput gene...
research
06/28/2022

Statistical Depth based Normalization and Outlier Detection of Gene Expression Data

Normalization and outlier detection belong to the preprocessing of gene ...
research
02/09/2019

Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition

Sparse representation based classification (SRC) methods have achieved r...

Please sign up or login with your details

Forgot password? Click here to reset