Netboost: Boosting-supported network analysis improves high-dimensional omics prediction in acute myeloid leukemia and Huntington's disease

09/27/2019
by   Pascal Schlosser, et al.
0

Background: State-of-the art selection methods fail to identify weak but cumulative effects of features found in many high-dimensional omics datasets. Nevertheless, these features play an important role in certain diseases. Results: We present Netboost, a three-step dimension reduction technique. First, a boosting-based filter is combined with the topological overlap measure to identify the essential edges of the network. Second, sparse hierarchical clustering is applied on the selected edges to identify modules and finally module information is aggregated by the first principal components. The primary analysis is than carried out on these summary measures instead of the original data. We demonstrate the application of the newly developed Netboost in combination with CoxBoost for survival prediction of DNA methylation and gene expression data from 180 acute myeloid leukemia (AML) patients and show, based on cross-validated prediction error curve estimates, its prediction superiority over variable selection on the full dataset as well as over an alternative clustering approach. The identified signature related to chromatin modifying enzymes was replicated in an independent dataset of AML patients in the phase II AMLSG 12-09 study. In a second application we combine Netboost with Random Forest classification and improve the disease classification error in RNA-sequencing data of Huntington's disease mice. Conclusion: Netboost improves definition of predictive variables for survival analysis and classification. It is a freely available Bioconductor R package for dimension reduction and hypothesis generation in high-dimensional omics applications.

READ FULL TEXT

page 15

page 19

research
06/05/2010

Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data

Class prediction is an important application of microarray gene expressi...
research
03/21/2018

SurvBoost: An R Package for High-Dimensional Variable Selection in the Stratified Proportional Hazards Model via Gradient Boosting

High-dimensional variable selection in the proportional hazards (PH) mod...
research
05/26/2015

Using Dimension Reduction to Improve the Classification of High-dimensional Data

In this work we show that the classification performance of high-dimensi...
research
07/25/2018

Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework

Background: Choosing the most performing method in terms of outcome pred...
research
03/07/2020

Large-scale benchmark study of survival prediction methods using multi-omics data

Multi-omics data, that is, datasets containing different types of high-d...
research
08/02/2023

Evaluation of network-guided random forest for disease gene discovery

Gene network information is believed to be beneficial for disease module...
research
07/20/2021

Study of the Parent-of-origin effect in monogenic diseases with variable age of onset. Application on ATTRv

In genetic diseases with variable age of onset, an accurate estimation o...

Please sign up or login with your details

Forgot password? Click here to reset