Hybrid Feature- and Similarity-Based Models for Prediction and Interpretation using Large-Scale Observational Data

04/12/2022
by   Jacqueline K. Kueper, et al.
0

Introduction: Large-scale electronic health record(EHR) datasets often include simple informative features like patient age and complex data like care history that are not easily represented as individual features. Such complex data have the potential to both improve the quality of risk assessment and to enable a better understanding of causal factors leading to those risks. We propose a hybrid feature- and similarity-based model for supervised learning that combines feature and kernel learning approaches to take advantage of rich but heterogeneous observational data sources to create interpretable models for prediction and for investigation of causal relationships. Methods: The proposed hybrid model is fit by convex optimization with a sparsity-inducing penalty on the kernel portion. Feature and kernel coefficients can be fit sequentially or simultaneously. We compared our models to solely feature- and similarity-based approaches using synthetic data and using EHR data from a primary health care organization to predict risk of loneliness or social isolation. We also present a new strategy for kernel construction that is suited to high-dimensional indicator-coded EHR data. Results: The hybrid models had comparable or better predictive performance than the feature- and kernel-based approaches in both the synthetic and clinical case studies. The inherent interpretability of the hybrid model is used to explore client characteristics stratified by kernel coefficient direction in the clinical case study; we use simple examples to discuss opportunities and cautions of the two hybrid model forms when causal interpretations are desired. Conclusion: Hybrid feature- and similarity-based models provide an opportunity to capture complex, high-dimensional data within an additive model structure that supports improved prediction and interpretation relative to simple models and opaque complex models.

READ FULL TEXT
research
04/08/2014

A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data

Predicting an individual's risk of experiencing a future clinical outcom...
research
10/10/2018

Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record

The wide implementation of electronic health record (EHR) systems facili...
research
02/19/2018

Simultaneous Modeling of Multiple Complications for Risk Profiling in Diabetes Care

Type 2 diabetes mellitus (T2DM) is a chronic disease that often results ...
research
12/01/2018

Measuring the Stability of EHR- and EKG-based Predictive Models

Databases of electronic health records (EHRs) are increasingly used to i...
research
12/01/2017

Prediction-Constrained Topic Models for Antidepressant Recommendation

Supervisory signals can help topic models discover low-dimensional data ...
research
05/17/2023

Risk Assessment of Lymph Node Metastases in Endometrial Cancer Patients: A Causal Approach

Assessing the pre-operative risk of lymph node metastases in endometrial...

Please sign up or login with your details

Forgot password? Click here to reset