Model-assisted cohort selection with bias analysis for generating large-scale cohorts from the EHR for oncology research

01/13/2020
by   Benjamin Birnbaum, et al.
0

Objective Electronic health records (EHRs) are a promising source of data for health outcomes research in oncology. A challenge in using EHR data is that selecting cohorts of patients often requires information in unstructured parts of the record. Machine learning has been used to address this, but even high-performing algorithms may select patients in a non-random manner and bias the resulting cohort. To improve the efficiency of cohort selection while measuring potential bias, we introduce a technique called Model-Assisted Cohort Selection (MACS) with Bias Analysis and apply it to the selection of metastatic breast cancer (mBC) patients. Materials and Methods We trained a model on 17,263 patients using term-frequency inverse-document-frequency (TF-IDF) and logistic regression. We used a test set of 17,292 patients to measure algorithm performance and perform Bias Analysis. We compared the cohort generated by MACS to the cohort that would have been generated without MACS as reference standard, first by comparing distributions of an extensive set of clinical and demographic variables and then by comparing the results of two analyses addressing existing example research questions. Results Our algorithm had an area under the curve (AUC) of 0.976, a sensitivity of 96.0 efficiency gain of 77.9 in baseline characteristics and no differences in the example analyses. Conclusion MACS with bias analysis can significantly improve the efficiency of cohort selection on EHR data while instilling confidence that outcomes research performed on the resulting cohort will not be biased.

READ FULL TEXT
research
11/12/2020

Patient Recruitment Using Electronic Health Records Under Selection Bias: a Two-phase Sampling Framework

Electronic health records (EHRs) are increasingly recognized as a cost-e...
research
06/13/2018

Using Clinical Narratives and Structured Data to Identify Distant Recurrences in Breast Cancer

Accurately identifying distant recurrences in breast cancer from the Ele...
research
12/24/2021

Constrained tensor factorization for computational phenotyping and mortality prediction in patients with cancer

Background: The increasing adoption of electronic health records (EHR) a...
research
03/22/2021

Detecting Racial Bias in Jury Selection

To support the 2019 U.S. Supreme Court case "Flowers v. Mississippi", AP...
research
04/24/2018

Internal relation between Personality trait Statistical outcomes among Junior College Divers and their performance

Objective: Personality trait can predict divers' behavioral performance ...

Please sign up or login with your details

Forgot password? Click here to reset