Log In Sign Up

Transformer-based unsupervised patient representation learning based on medical claims for risk stratification and analysis

by   Xianlong Zeng, et al.

The claims data, containing medical codes, services information, and incurred expenditure, can be a good resource for estimating an individual's health condition and medical risk level. In this study, we developed Transformer-based Multimodal AutoEncoder (TMAE), an unsupervised learning framework that can learn efficient patient representation by encoding meaningful information from the claims data. TMAE is motivated by the practical needs in healthcare to stratify patients into different risk levels for improving care delivery and management. Compared to previous approaches, TMAE is able to 1) model inpatient, outpatient, and medication claims collectively, 2) handle irregular time intervals between medical events, 3) alleviate the sparsity issue of the rare medical codes, and 4) incorporate medical expenditure information. We trained TMAE using a real-world pediatric claims dataset containing more than 600,000 patients and compared its performance with various approaches in two clustering tasks. Experimental results demonstrate that TMAE has superior performance compared to all baselines. Multiple downstream applications are also conducted to illustrate the effectiveness of our framework. The promising results confirm that the TMAE framework is scalable to large claims data and is able to generate efficient patient embeddings for risk stratification and analysis.


page 5

page 7


Distributed representation of patients and its use for medical cost prediction

Efficient representation of patients is very important in the healthcare...

TAPER: Time-Aware Patient EHR Representation

Effective representation learning of electronic health records is a chal...

Suicide Risk Modeling with Uncertain Diagnostic Records

Motivated by the pressing need for suicide prevention through improving ...

Sequential Diagnosis Prediction with Transformer and Ontological Representation

Sequential diagnosis prediction on the Electronic Health Record (EHR) ha...

Phenotype Detection in Real World Data via Online MixEHR Algorithm

Understanding patterns of diagnoses, medications, procedures, and labora...

Explainable Health Risk Predictor with Transformer-based Medicare Claim Encoder

In 2019, The Centers for Medicare and Medicaid Services (CMS) launched a...