Binacox: automatic cut-points detection in high-dimensional Cox model, with applications to genetic data

07/25/2018
by   Simon Bussy, et al.
0

Determining significant prognostic biomarkers is of increasing importance in many areas of medicine. In order to translate a continuous biomarker into a clinical decision, it is often necessary to determine cut-points. There is so far no standard method to help evaluate how many cut-points are optimal for a given feature in a survival analysis setting. Moreover, most existing methods are univariate, hence not well suited for high-dimensional frameworks. This paper introduces a prognostic method called Binacox to deal with the problem of detecting multiple cut-points per features in a multivariate setting where a large number of continuous features are available. It is based on the Cox model and combines one-hot encodings with the binarsity penalty. This penalty uses total-variation regularization together with an extra linear constraint to avoid collinearity between the one-hot encodings and enable feature selection. A non-asymptotic oracle inequality is established. The statistical performance of the method is then examined on an extensive Monte Carlo simulation study, and finally illustrated on three publicly available genetic cancer datasets with high-dimensional features. On this datasets, our proposed methodology significantly outperforms the state-of-the-art survival models regarding risk prediction in terms of C-index, with a computing time orders of magnitude faster. In addition, it provides powerful interpretability by automatically pinpointing significant cut-points on relevant features from a clinical point of view.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2023

FLASH: a Fast joint model for Longitudinal And Survival data in High dimension

This paper introduces a prognostic method called FLASH that addresses th...
research
03/24/2017

Binarsity: a penalization for one-hot encoded features

This paper deals with the problem of large-scale linear supervised learn...
research
10/24/2016

C-mix: a high dimensional mixture model for censored durations, with applications to genetic data

We introduce a mixture model for censored durations (C-mix), and develop...
research
02/17/2022

Modeling High-Dimensional Data with Unknown Cut Points: A Fusion Penalized Logistic Threshold Regression

In traditional logistic regression models, the link function is often as...
research
03/22/2020

High-dimensional inference for inhomogeneous Gibbs point processes

Gibbs point processes (GPPs) constitute a large and flexible class of sp...
research
05/27/2022

Hazard Gradient Penalty for Survival Analysis

Survival analysis appears in various fields such as medicine, economics,...

Please sign up or login with your details

Forgot password? Click here to reset