Learning Patient Static Information from Time-series EHR and an Approach for Safeguarding Privacy and Fairness

09/20/2023
by   Wei Liao, et al.
0

Recent work in machine learning for healthcare has raised concerns about patient privacy and algorithmic fairness. For example, previous work has shown that patient self-reported race can be predicted from medical data that does not explicitly contain racial information. However, the extent of data identification is unknown, and we lack ways to develop models whose outcomes are minimally affected by such information. Here we systematically investigated the ability of time-series electronic health record data to predict patient static information. We found that not only the raw time-series data, but also learned representations from machine learning models, can be trained to predict a variety of static information with area under the receiver operating characteristic curve as high as 0.851 for biological sex, 0.869 for binarized age and 0.810 for self-reported race. Such high predictive performance can be extended to a wide range of comorbidity factors and exists even when the model was trained for different tasks, using different cohorts, using different model architectures and databases. Given the privacy and fairness concerns these findings pose, we develop a variational autoencoder-based approach that learns a structured latent space to disentangle patient-sensitive attributes from time-series data. Our work thoroughly investigates the ability of machine learning models to encode patient static information from time-series electronic health records and introduces a general approach to protect patient-sensitive attribute information for downstream tasks.

READ FULL TEXT

page 5

page 10

research
04/25/2023

DuETT: Dual Event Time Transformer for Electronic Health Records

Electronic health records (EHRs) recorded in hospital settings typically...
research
07/14/2019

Counterfactual Reasoning for Fair Clinical Risk Prediction

The use of machine learning systems to support decision making in health...
research
06/10/2023

Explaining a machine learning decision to physicians via counterfactuals

Machine learning models perform well on several healthcare tasks and can...
research
08/30/2021

Time Series Prediction using Deep Learning Methods in Healthcare

Traditional machine learning methods face two main challenges in dealing...
research
05/05/2023

Data Encoding For Healthcare Data Democratisation and Information Leakage Prevention

The lack of data democratization and information leakage from trained mo...
research
05/27/2022

Group GAN

Generating multivariate time series is a promising approach for sharing ...
research
10/28/2022

Mitigating Health Disparities in EHR via Deconfounder

Health disparities, or inequalities between different patient demographi...

Please sign up or login with your details

Forgot password? Click here to reset