EVA: Generating Longitudinal Electronic Health Records Using Conditional Variational Autoencoders

12/18/2020
by   Siddharth Biswal, et al.
13

Researchers require timely access to real-world longitudinal electronic health records (EHR) to develop, test, validate, and implement machine learning solutions that improve the quality and efficiency of healthcare. In contrast, health systems value deeply patient privacy and data security. De-identified EHRs do not adequately address the needs of health systems, as de-identified data are susceptible to re-identification and its volume is also limited. Synthetic EHRs offer a potential solution. In this paper, we propose EHR Variational Autoencoder (EVA) for synthesizing sequences of discrete EHR encounters (e.g., clinical visits) and encounter features (e.g., diagnoses, medications, procedures). We illustrate that EVA can produce realistic EHR sequences, account for individual differences among patients, and can be conditioned on specific disease conditions, thus enabling disease-specific studies. We design efficient, accurate inference algorithms by combining stochastic gradient Markov Chain Monte Carlo with amortized variational inference. We assess the utility of the methods on large real-world EHR repositories containing over 250, 000 patients. Our experiments, which include user studies with knowledgeable clinicians, indicate the generated EHR sequences are realistic. We confirmed the performance of predictive models trained on the synthetic data are similar with those trained on real EHRs. Additionally, our findings indicate that augmenting real data with synthetic EHRs results in the best predictive performance - improving the best baseline by as much as 8

READ FULL TEXT
research
04/04/2023

Synthesize Extremely High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model

Synthetic electronic health records (EHRs) that are both realistic and p...
research
03/23/2023

Variational Bayes latent class approach for EHR-based phenotyping with large real-world data

Bayesian approaches to clinical analyses for the purposes of patient phe...
research
01/20/2022

Conditional Generation of Medical Time Series for Extrapolation to Underrepresented Populations

The widespread adoption of electronic health records (EHRs) and subseque...
research
11/09/2020

Sparse Longitudinal Representations of Electronic Health Record Data for the Early Detection of Chronic Kidney Disease in Diabetic Patients

Chronic kidney disease (CKD) is a gradual loss of renal function over ti...
research
10/13/2019

Nonstationary Multivariate Gaussian Processes for Electronic Health Records

We propose multivariate nonstationary Gaussian processes for jointly mod...
research
04/23/2021

Inferring medication adherence from time-varying health measures

Medication adherence is a problem of widespread concern in clinical care...
research
11/06/2019

Deep Sequential Models for Suicidal Ideation from Multiple Source Data

This article presents a novel method for predicting suicidal ideation fr...

Please sign up or login with your details

Forgot password? Click here to reset