A Semi-Supervised Machine Learning Approach to Detecting Recurrent Metastatic Breast Cancer Cases Using Linked Cancer Registry and Electronic Medical Record Data

01/17/2019
by   Albee Y. Ling, et al.
0

Objectives: Most cancer data sources lack information on metastatic recurrence. Electronic medical records (EMRs) and population-based cancer registries contain complementary information on cancer treatment and outcomes, yet are rarely used synergistically. To enable detection of metastatic breast cancer (MBC), we applied a semi-supervised machine learning framework to linked EMR-California Cancer Registry (CCR) data. Materials and Methods: We studied 11,459 female patients treated at Stanford Health Care who received an incident breast cancer diagnosis from 2000-2014. The dataset consisted of structured data and unstructured free-text clinical notes from EMR, linked to CCR, a component of the Surveillance, Epidemiology and End Results (SEER) database. We extracted information on metastatic disease from patient notes to infer a class label and then trained a regularized logistic regression model for MBC classification. We evaluated model performance on a gold standard set of set of 146 patients. Results: There are 495 patients with de novo stage IV MBC, 1,374 patients initially diagnosed with Stage 0-III disease had recurrent MBC, and 9,590 had no evidence of metastatis. The median follow-up time is 96.3 months (mean 97.8, standard deviation 46.7). The best-performing model incorporated both EMR and CCR features. The area under the receiver-operating characteristic curve=0.925 [95 specificity=0.878 and overall accuracy=0.870. Discussion and Conclusion: A framework for MBC case detection combining EMR and CCR data achieved good sensitivity, specificity and discrimination without requiring expert-labeled examples. This approach enables population-based research on how patients die from cancer and may identify novel predictors of cancer recurrence.

READ FULL TEXT
research
06/13/2018

Using Clinical Narratives and Structured Data to Identify Distant Recurrences in Breast Cancer

Accurately identifying distant recurrences in breast cancer from the Ele...
research
01/13/2020

Model-assisted cohort selection with bias analysis for generating large-scale cohorts from the EHR for oncology research

Objective Electronic health records (EHRs) are a promising source of dat...
research
01/09/2018

Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) Utilizing Free-Text Clinical Narratives

We propose a deep learning model - Probabilistic Prognostic Estimates of...
research
05/18/2022

A Scalable Workflow to Build Machine Learning Classifiers with Clinician-in-the-Loop to Identify Patients in Specific Diseases

Clinicians may rely on medical coding systems such as International Clas...
research
07/20/2018

Knowledge Integration for Disease Characterization: A Breast Cancer Example

With the rapid advancements in cancer research, the information that is ...
research
04/02/2019

A frame semantic overview of NLP-based information extraction for cancer-related EHR notes

Objective: There is a lot of information about cancer in Electronic Heal...

Please sign up or login with your details

Forgot password? Click here to reset