Prediction-Constrained Topic Models for Antidepressant Recommendation

12/01/2017
by   Michael C. Hughes, et al.
0

Supervisory signals can help topic models discover low-dimensional data representations that are more interpretable for clinical tasks. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by not properly handling a fundamental asymmetry: the intended task is always predicting labels from data, not data from labels. Our new prediction-constrained objective trains models that predict labels from heldout data well while also producing good generative likelihoods and interpretable topic-word parameters. In a case study on predicting depression medications from electronic health records, we demonstrate improved recommendations compared to previous supervised topic models and high- dimensional logistic regression from words alone.

READ FULL TEXT
research
07/23/2017

Prediction-Constrained Training for Semi-Supervised Mixture and Topic Models

Supervisory signals have the potential to make low-dimensional data repr...
research
11/15/2019

Prediction Focused Topic Models for Electronic Health Records

Electronic Health Record (EHR) data can be represented as discrete count...
research
12/13/2020

Inference for the Case Probability in High-dimensional Logistic Regression

Labeling patients in electronic health records with respect to their sta...
research
10/12/2019

Prediction Focused Topic Models via Vocab Selection

Supervised topic models are often sought to balance prediction quality a...
research
11/18/2017

Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference

The anchor words algorithm performs provably efficient topic model infer...
research
07/28/2016

Preterm Birth Prediction: Deriving Stable and Interpretable Rules from High Dimensional Data

Preterm births occur at an alarming rate of 10-15 risk of infant mortali...
research
04/12/2022

Hybrid Feature- and Similarity-Based Models for Prediction and Interpretation using Large-Scale Observational Data

Introduction: Large-scale electronic health record(EHR) datasets often i...

Please sign up or login with your details

Forgot password? Click here to reset