Using Clinical Narratives and Structured Data to Identify Distant Recurrences in Breast Cancer

06/13/2018
by   Zexian Zeng, et al.
0

Accurately identifying distant recurrences in breast cancer from the Electronic Health Records (EHR) is important for both clinical care and secondary analysis. Although multiple applications have been developed for computational phenotyping in breast cancer, distant recurrence identification still relies heavily on manual chart review. In this study, we aim to develop a model that identifies distant recurrences in breast cancer using clinical narratives and structured data from EHR. We apply MetaMap to extract features from clinical narratives and also retrieve structured clinical data from EHR. Using these features, we train a support vector machine model to identify distant recurrences in breast cancer patients. We train the model using 1,396 double-annotated subjects and validate the model using 599 double-annotated subjects. In addition, we validate the model on a set of 4,904 single-annotated subjects as a generalization test. We obtained a high area under curve (AUC) score of 0.92 (SD=0.01) in the cross-validation using the training dataset, then obtained AUC scores of 0.95 and 0.93 in the held-out test and generalization test using 599 and 4,904 samples respectively. Our model can accurately and efficiently identify distant recurrences in breast cancer by combining features extracted from unstructured clinical narratives and structured clinical data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2023

Generalizability of PRS313 for breast cancer risk amongst non-Europeans in a Los Angeles biobank

Polygenic risk scores (PRS) summarize the combined effect of common risk...
research
03/13/2023

A new methodology to predict the oncotype scores based on clinico-pathological data with similar tumor profiles

Introduction: The Oncotype DX (ODX) test is a commercially available mol...
research
12/06/2017

An innovative solution for breast cancer textual big data analysis

The digitalization of stored information in hospitals now allows for the...
research
01/13/2020

Model-assisted cohort selection with bias analysis for generating large-scale cohorts from the EHR for oncology research

Objective Electronic health records (EHRs) are a promising source of dat...
research
05/24/2019

Perturbed Model Validation: A New Framework to Validate Model Relevance

This paper introduces PMV (Perturbed Model Validation), a new technique ...

Please sign up or login with your details

Forgot password? Click here to reset