Two-step penalised logistic regression for multi-omic data with an application to cardiometabolic syndrome

08/01/2020
by   Alessandra Cabassi, et al.
0

Building classification models that predict a binary class label on the basis of high dimensional multi-omics datasets poses several challenges, due to the typically widely differing characteristics of the data layers in terms of number of predictors, type of data, and levels of noise. Previous research has shown that applying classical logistic regression with elastic-net penalty to these datasets can lead to poor results (Liu et al., 2018). We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately and a predictive model is then built using the variables selected in the first step. Here, our approach is compared to other methods that have been developed for the same purpose, and we adapt existing software for multi-omic linear regression (Zhao and Zucknick, 2020) to the logistic regression setting. Extensive simulation studies show that our approach should be preferred if the goal is to select as many relevant predictors as possible, as well as achieving prediction performances comparable to those of the best competitors. Our motivating example is a cardiometabolic syndrome dataset comprising eight 'omic data types for 2 extreme phenotype groups (10 obese and 10 lipodystrophy individuals) and 185 blood donors. Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level. R code is available at https://github.com/acabassi/logistic-regression-for-multi-omic-data.

READ FULL TEXT

page 17

page 19

page 32

page 33

page 34

page 35

page 36

page 38

research
11/25/2014

PLUTO: Penalized Unbiased Logistic Regression Trees

We propose a new algorithm called PLUTO for building logistic regression...
research
01/03/2018

Modeling Interaction Effects in Logistic Regression: Information Analysis

The Akaike information criterion (AIC) is commonly used to select a logi...
research
08/19/2023

High Performance Computing Applied to Logistic Regression: A CPU and GPU Implementation Comparison

We present a versatile GPU-based parallel version of Logistic Regression...
research
07/23/2023

Comparative analysis using classification methods versus early stage diabetes

In this research work, a comparative analysis was carried out using clas...
research
02/18/2023

Identify local limiting factors of species distribution using min-linear logistic regression

Logistic regression is a commonly used building block in ecological mode...
research
06/14/2017

Predictive modelling of training loads and injury in Australian football

To investigate whether training load monitoring data could be used to pr...
research
03/08/2017

Sparse Quadratic Logistic Regression in Sub-quadratic Time

We consider support recovery in the quadratic logistic regression settin...

Please sign up or login with your details

Forgot password? Click here to reset