DeepAI
Log In Sign Up

Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data

04/07/2021
by   Tingyi Wanyan, et al.
14

Electronic Health Record (EHR) data has been of tremendous utility in Artificial Intelligence (AI) for healthcare such as predicting future clinical events. These tasks, however, often come with many challenges when using classical machine learning models due to a myriad of factors including class imbalance and data heterogeneity (i.e., the complex intra-class variances). To address some of these research gaps, this paper leverages the exciting contrastive learning framework and proposes a novel contrastive regularized clinical classification model. The contrastive loss is found to substantially augment EHR-based prediction: it effectively characterizes the similar/dissimilar patterns (by its "push-and-pull" form), meanwhile mitigating the highly skewed class distribution by learning more balanced feature spaces (as also echoed by recent findings). In particular, when naively exporting the contrastive learning to the EHR data, one hurdle is in generating positive samples, since EHR data is not as amendable to data augmentation as image data. To this end, we have introduced two unique positive sampling strategies specifically tailored for EHR data: a feature-based positive sampling that exploits the feature space neighborhood structure to reinforce the feature learning; and an attribute-based positive sampling that incorporates pre-generated patient similarity metrics to define the sample proximity. Both sampling approaches are designed with an awareness of unique high intra-class variance in EHR data. Our overall framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data with a total of 5,712 patients admitted to a large, urban health system. Specifically, our method reaches a high AUROC prediction score of 0.959, which outperforms other baselines and alternatives: cross-entropy(0.873) and focal loss(0.931).

READ FULL TEXT

page 1

page 2

page 3

page 4

10/11/2021

SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records

Contrastive learning has demonstrated promising performance in image and...
01/11/2021

Contrastive Learning Improves Critical Event Prediction in COVID-19 Patients

Machine Learning (ML) models typically require large-scale, balanced tra...
11/14/2022

C3: Cross-instance guided Contrastive Clustering

Clustering is the task of gathering similar data samples into clusters w...
07/14/2022

An Asymmetric Contrastive Loss for Handling Imbalanced Datasets

Contrastive learning is a representation learning method performed by co...
10/10/2018

Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record

The wide implementation of electronic health record (EHR) systems facili...
07/23/2020

Clinical Recommender System: Predicting Medical Specialty Diagnostic Choices with Neural Network Ensembles

The growing demand for key healthcare resources such as clinical experti...
07/13/2020

A Machine Learning Approach to Assess Student Group Collaboration Using Individual Level Behavioral Cues

K-12 classrooms consistently integrate collaboration as part of their le...