Deep Representation Learning of Electronic Health Records to Unlock Patient Stratification at Scale

03/14/2020
by   Isotta Landi, et al.
26

Objective: Deriving disease subtypes from electronic health records (EHRs) can guide next-generation personalized medicine. However, challenges in summarizing and representing patient data prevent widespread practice of scalable EHR-based stratification analysis. Here, we present a novel unsupervised framework based on deep learning to process heterogeneous EHRs and derive patient representations that can efficiently and effectively enable patient stratification at scale. Materials and methods: We considered EHRs of 1,608,741 patients from a diverse hospital cohort comprising of a total of 57,464 clinical concepts. We introduce a representation learning model based on word embeddings, convolutional neural networks and autoencoders (i.e., "ConvAE") to transform patient trajectories into low-dimensional latent vectors. We evaluated these representations as broadly enabling patient stratification by applying hierarchical clustering to different multi-disease and disease-specific patient cohorts. Results: ConvAE significantly outperformed several common baselines in a clustering task to identify patients with different complex conditions, with 2.61 entropy and 0.31 purity average scores. When applied to stratify patients within a certain condition, ConvAE led to various clinically relevant subtypes for different disorders, including type 2 diabetes, Parkinson's disease and Alzheimer's disease, largely related to comorbidities, disease progression, and symptom severity. Conclusions: Patient representations derived from modeling EHRs with ConvAE can help develop personalized medicine therapeutic strategies and better understand varying etiologies in heterogeneous sub-populations.

READ FULL TEXT

page 22

page 23

page 42

research
07/10/2018

Using deep learning for comprehensive, personalized forecasting of Alzheimer's Disease progression

A patient is more than one number, yet most approaches to machine learni...
research
09/04/2019

Latent Gaussian process with composite likelihoods for data-driven disease stratification

Data-driven techniques for identifying disease subtypes using medical re...
research
07/22/2019

BEHRT: Transformer for Electronic Health Records

Today, despite decades of developments in medicine and the growing inter...
research
03/22/2023

ExBEHRT: Extended Transformer for Electronic Health Records to Predict Disease Subtypes Progressions

In this study, we introduce ExBEHRT, an extended version of BEHRT (BERT ...
research
09/17/2018

Integrative Analysis of Patient Health Records and Neuroimages via Memory-based GraphConvolutional Network

With the arrival of the big data era, more and more data are becoming re...
research
09/29/2021

Temporal Clustering with External Memory Network for Disease Progression Modeling

Disease progression modeling (DPM) involves using mathematical framework...
research
05/17/2019

Unsupervised Machine Learning for the Discovery of Latent Disease Clusters and Patient Subgroups Using Electronic Health Records

Machine learning has become ubiquitous and a key technology on mining el...

Please sign up or login with your details

Forgot password? Click here to reset