RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

07/06/2017
by   Ji-Sung Kim, et al.
0

Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), and area under the curve for receiver operating characteristic plots (all p < 10^-6). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2022

Modeling rates of disease with missing categorical data

Covariates like age, sex, and race/ethnicity provide invaluable insight ...
research
04/18/2023

Coarse race data conceals disparities in clinical risk score performance

Healthcare data in the United States often records only a patient's coar...
research
07/21/2021

Reading Race: AI Recognises Patient's Racial Identity In Medical Images

Background: In medical imaging, prior studies have demonstrated disparat...
research
11/23/2021

RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR

Despite the routine use of electronic health record (EHR) data by radiol...
research
05/08/2022

Write It Like You See It: Detectable Differences in Clinical Notes By Race Lead To Differential Model Recommendations

Clinical notes are becoming an increasingly important data source for ma...
research
03/14/2019

Interpretation of machine learning predictions for patient outcomes in electronic health records

Electronic health records are an increasingly important resource for und...
research
05/12/2022

Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements

Prediction of an individual's race and ethnicity plays an important role...

Please sign up or login with your details

Forgot password? Click here to reset