Supervised multi-specialist topic model with applications on large-scale electronic health record data

05/04/2021
by   Ziyang Song, et al.
0

Motivation: Electronic health record (EHR) data provides a new venue to elucidate disease comorbidities and latent phenotypes for precision medicine. To fully exploit its potential, a realistic data generative process of the EHR data needs to be modelled. We present MixEHR-S to jointly infer specialist-disease topics from the EHR data. As the key contribution, we model the specialist assignments and ICD-coded diagnoses as the latent topics based on patient's underlying disease topic mixture in a novel unified supervised hierarchical Bayesian topic model. For efficient inference, we developed a closed-form collapsed variational inference algorithm to learn the model distributions of MixEHR-S. We applied MixEHR-S to two independent large-scale EHR databases in Quebec with three targeted applications: (1) Congenital Heart Disease (CHD) diagnostic prediction among 154,775 patients; (2) Chronic obstructive pulmonary disease (COPD) diagnostic prediction among 73,791 patients; (3) future insulin treatment prediction among 78,712 patients diagnosed with diabetes as a mean to assess the disease exacerbation. In all three applications, MixEHR-S conferred clinically meaningful latent topics among the most predictive latent topics and achieved superior target prediction accuracy compared to the existing methods, providing opportunities for prioritizing high-risk patients for healthcare services. MixEHR-S source code and scripts of the experiments are freely available at https://github.com/li-lab-mcgill/mixehrS

READ FULL TEXT

page 14

page 16

page 26

page 28

page 29

page 31

research
11/15/2019

Prediction Focused Topic Models for Electronic Health Records

Electronic Health Record (EHR) data can be represented as discrete count...
research
07/24/2018

Hierarchical infinite factor model for improving the prediction of surgical complications for geriatric patients

We develop a hierarchical infinite latent factor model (HIFM) to appropr...
research
11/01/2018

A latent topic model for mining heterogenous non-randomly missing electronic health records data

Electronic health records (EHR) are rich heterogeneous collection of pat...
research
09/19/2021

Co-occurrence of medical conditions: Exposing patterns through probabilistic topic modeling of SNOMED codes

Patients associated with multiple co-occurring health conditions often f...
research
06/03/2022

Modeling electronic health record data using a knowledge-graph-embedded topic model

The rapid growth of electronic health record (EHR) datasets opens up pro...
research
03/14/2020

Using Data Assimilation of Mechanistic Models to Estimate Glucose and Insulin Metabolism

Motivation: There is a growing need to integrate mechanistic models of b...
research
07/16/2023

The Potential and Pitfalls of using a Large Language Model such as ChatGPT or GPT-4 as a Clinical Assistant

Recent studies have demonstrated promising performance of ChatGPT and GP...

Please sign up or login with your details

Forgot password? Click here to reset