Development and external validation of a lung cancer risk estimation tool using gradient-boosting

08/23/2023
by   Pierre-Louis Benveniste, et al.
0

Lung cancer is a significant cause of mortality worldwide, emphasizing the importance of early detection for improved survival rates. In this study, we propose a machine learning (ML) tool trained on data from the PLCO Cancer Screening Trial and validated on the NLST to estimate the likelihood of lung cancer occurrence within five years. The study utilized two datasets, the PLCO (n=55,161) and NLST (n=48,595), consisting of comprehensive information on risk factors, clinical measurements, and outcomes related to lung cancer. Data preprocessing involved removing patients who were not current or former smokers and those who had died of causes unrelated to lung cancer. Additionally, a focus was placed on mitigating bias caused by censored data. Feature selection, hyper-parameter optimization, and model calibration were performed using XGBoost, an ensemble learning algorithm that combines gradient boosting and decision trees. The ML model was trained on the pre-processed PLCO dataset and tested on the NLST dataset. The model incorporated features such as age, gender, smoking history, medical diagnoses, and family history of lung cancer. The model was well-calibrated (Brier score=0.044). ROC-AUC was 82 dataset and 70 compared to the USPSTF guidelines for lung cancer screening, our model provided the same recall with a precision of 13.1 vs. 3.1 web application for estimating the likelihood of developing lung cancer within five years. By utilizing risk factors and clinical data, individuals can assess their risk and make informed decisions regarding lung cancer screening. This research contributes to the efforts in early detection and prevention strategies, aiming to reduce lung cancer-related mortality rates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2023

Penalized Deep Partially Linear Cox Models with Application to CT Scans of Lung Cancer Patients

Lung cancer is a leading cause of cancer mortality globally, highlightin...
research
08/21/2019

Statistical approaches using longitudinal biomarkers for disease early detection: A comparison of methodologies

Early detection of clinical outcomes such as cancer may be predicted bas...
research
03/19/2023

A hybrid CNN-RNN approach for survival analysis in a Lung Cancer Screening study

In this study, we present a hybrid CNN-RNN approach to investigate long-...
research
06/21/2022

H E-based Computational Biomarker Enables Universal EGFR Screening for Lung Adenocarcinoma

Lung cancer is the leading cause of cancer death worldwide, with lung ad...
research
06/30/2023

A general two-stage progressive model of cancer natural history to project downstaging due to multi-cancer screening tests

Multi-cancer early detection (MCED) tests offer to screen for multiple t...
research
04/20/2021

Development of a dynamic type 2 diabetes risk prediction tool: a UK Biobank study

Diabetes affects over 400 million people and is among the leading causes...
research
02/21/2018

Clinically verified pre-screening for cancer using web search queries: Initial results

Search engine queries have been demonstrated to be a useful signal for s...

Please sign up or login with your details

Forgot password? Click here to reset