Optimal Cox Regression Subsampling Procedure with Rare Events

12/03/2020
by   Nir Keret, et al.
0

Massive sized survival datasets are becoming increasingly prevalent with the development of the healthcare industry. Such datasets pose computational challenges unprecedented in traditional survival analysis use-cases. A popular way for coping with massive datasets is downsampling them to a more manageable size, such that the computational resources can be afforded by the researcher. Cox proportional hazards regression has remained one of the most popular statistical models for the analysis of survival data to-date. This work addresses the settings of right censored and possibly left truncated data with rare events, such that the observed failure times constitute only a small portion of the overall sample. We propose Cox regression subsampling-based estimators that approximate their full-data partial-likelihood-based counterparts, by assigning optimal sampling probabilities to censored observations, and including all observed failures in the analysis. Asymptotic properties of the proposed estimators are established under suitable regularity conditions, and simulation studies are carried out to evaluate the finite sample performance of the estimators. We further apply our procedure on UK-biobank colorectal cancer genetic and environmental risk factors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2023

Optimal subsampling for the Cox proportional hazards model with massive survival data

The use of massive survival data has become common in survival analysis....
research
01/30/2019

Causal Proportional Hazards Estimation with a Binary Instrumental Variable

Instrumental variables (IV) are a useful tool for estimating causal effe...
research
04/02/2018

A Fast Divide-and-Conquer Sparse Cox Regression

We propose a computationally and statistically efficient divide-and-conq...
research
04/18/2022

Massive Parallelization of Massive Sample-size Survival Analysis

Large-scale observational health databases are increasingly popular for ...
research
09/03/2023

Unlocking Retrospective Prevalent Information in EHRs – a Pairwise Pseudolikelihood Approach

Typically, electronic health record data are not collected towards a spe...
research
10/03/2021

A Sequential Addressing Subsampling Method for Massive Data Analysis under Memory Constraint

The emergence of massive data in recent years brings challenges to autom...
research
11/01/2019

Residual Analysis for Regression with Censored Data via Randomized Survival Probabilities

Residual analysis is extremely important in regression modelling. Residu...

Please sign up or login with your details

Forgot password? Click here to reset