Deep Contextual Clinical Prediction with Reverse Distillation

07/10/2020
by   Rohan S. Kodialam, et al.
20

Healthcare providers are increasingly using learned methods to predict and understand long-term patient outcomes in order to make meaningful interventions. However, despite innovations in this area, deep learning models often struggle to match performance of shallow linear models in predicting these outcomes, making it difficult to leverage such techniques in practice. In this work, motivated by the task of clinical prediction from insurance claims, we present a new technique called reverse distillation which pretrains deep models by using high-performing linear models for initialization. We make use of the longitudinal structure of insurance claims datasets to develop Self Attention with Reverse Distillation, or SARD, an architecture that utilizes a combination of contextual embedding, temporal embedding and self-attention mechanisms and most critically is trained via reverse distillation. SARD outperforms state-of-the-art methods on multiple clinical prediction outcomes, with ablation studies revealing that reverse distillation is a primary driver of these improvements.

READ FULL TEXT
research
11/13/2019

ZiMM: a deep learning model for long term adverse events with non-clinical claims data

This paper considers the problem of modeling long-term adverse events fo...
research
11/03/2018

Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes

Biomedical association studies are increasingly done using clinical conc...
research
08/02/2019

Learning Lightweight Lane Detection CNNs by Self Attention Distillation

Training deep models for lane detection is challenging due to the very s...
research
11/13/2019

SAVEHR: Self Attention Vector Representations for EHR based Personalized Chronic Disease Onset Prediction and Interpretability

Chronic disease progression is emerging as an important area of investme...
research
06/13/2019

Linear Distillation Learning

Deep Linear Networks do not have expressive power but they are mathemati...
research
06/04/2021

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

We challenge a common assumption underlying most supervised deep learnin...
research
12/12/2022

Mortality Prediction Models with Clinical Notes Using Sparse Attention at the Word and Sentence Levels

Intensive Care in-hospital mortality prediction has various clinical app...

Please sign up or login with your details

Forgot password? Click here to reset