Outcome identification in electronic health records using predictions from an enriched Dirichlet process mixture

by   Bret Zeldow, et al.

We propose a novel semiparametric model for the joint distribution of a continuous longitudinal outcome and the baseline covariates using an enriched Dirichlet process (EDP) prior. This joint model decomposes into a linear mixed model for the outcome given the covariates and marginals for the covariates. The nonparametric EDP prior is placed on the regression and spline coefficients, the error variance, and the parameters governing the predictor space. We predict the outcome at unobserved time points for subjects with data at other time points as well as for new subjects with only baseline covariates. We find improved prediction over mixed models with Dirichlet process (DP) priors when there are a large number of covariates. Our method is demonstrated with electronic health records consisting of initiators of second generation antipsychotic medications, which are known to increase the risk of diabetes. We use our model to predict laboratory values indicative of diabetes for each individual and assess incidence of suspected diabetes from the predicted dataset. Our model also serves as a functional clustering algorithm in which subjects are clustered into groups with similar longitudinal trajectories of the outcome over time.


page 1

page 2

page 3

page 4


Modeling Heterogeneity and Missing Data of Multiple Longitudinal Outcomes in Electronic Health Records

In electronic health records (EHRs), latent subgroups of patients may ex...

Penalized regression calibration: a method for the prediction of survival outcomes using complex longitudinal and high-dimensional data

Longitudinal and high-dimensional measurements have become increasingly ...

Dirichlet Process Mixtures of Generalized Linear Models

We propose Dirichlet Process mixtures of Generalized Linear Models (DP-G...

A Bayesian Approach to Modelling Longitudinal Data in Electronic Health Records

Analyzing electronic health records (EHR) poses significant challenges b...

Improved Generalized Raking Estimators to Address Dependent Covariate and Failure-Time Outcome Error

Biomedical studies that use electronic health records (EHR) data for inf...

Bayesian nonparametric multiway regression for clustered binomial data

We introduce a Bayesian nonparametric regression model for data with mul...

A novel approach to estimate the Cox model with temporal covariates and its application to medical cost data

We propose a novel approach to estimate the Cox model with temporal cova...

Please sign up or login with your details

Forgot password? Click here to reset