## 1 Introduction

Real-world healthcare datasets across acute and community care settings exhibit a high prevalence of missing values. Due to this challenge, predictive analysis with these data is often challenging, and insights mined from these data may not be most reliable liptonmodeling51310. In particular, this problem is especially cumbersome for clinical time series data comprising longitudinal records of patient state. This has spurred a longstanding interest in missing data imputation methods 5; BayesianMissing_2003; review_2010.

Traditional approaches for such problems have relied on statistical models and associated Bayesian inference paradigms

multiple_20119hastie2005elements, but these require strong constraints on the data-generating process, and treat the imputation and prediction as independent tasks 3; multiple_2011; EHRD_2013. To overcome these limitations, recent works have proposed deep learning approaches using recurrent neural networks

5; ML4H; Lipton2016; 9; 1. While these methods learn directly from data without imposing specific assumptions on the underlying processes, and show promise for accurate imputation and prediction on clinical time series, they provide deterministic outputs and neglect the uncertainty inherent to the task.In this work, we present a unified Bayesian framework for imputation and prediction with multivariate clinical time series. We embed a Bayesian Recurrent Neural Network and a Bayesian Neural Network within a recurrent dynamical system for integrative missing value imputation and prediction. We characterize performance on mortality prediction tasks with the publicly available MIMIC-III 6 and PhysioNet 7 benchmark data sets, and demonstrate significant performance improvements. We further show strong correlations between the variability and accuracy of the imputations. These results suggest that our approach adapts the imputation model to the uncertainty inherent in the modelling task, and offers a principled way to assess reliability of imputations and predictions.

## 2 Methods

Problem Formulation: We consider dataset of samples. Each sample is denoted as an input-output pair , where denotes a multivariate time series input with features and denotes its output. Each feature is a sequence of observations over time steps . In practice, may have missing values. A masking matrix represents the presence of missing values: when is missing, otherwise . The objectives are: (a) impute the missing values in and (b) predict given .

Bayesian Recurrent Framework: To address the above objectives, prior works leverage recurrent dynamical systems for deterministic models 5; Lipton2016. Here, we augment such recurrent dynamical systems with Bayesian approaches to model the uncertainty in the imputation and prediction tasks. Fig. 1 illustrates our proposed Bayesian recurrent framework for imputation and prediction.

At each time step, the input is fed through the masking layer to a Bayesian recurrent neural network. The RNN hidden state dynamics are specified as:

(1) |

where is the RNN hidden state at time and denotes the weights for the recurrent layer. As the inputs may not always be regularly sampled over time, we incorporate a temporal decay factor in the hidden state dynamics 5. To model the uncertainty in , the Bayesian RNN considers a probabilistic distribution for the weights, instead of fixed values 8.

If has missing values for any feature, we impute the missing values and replace with:

(2) |

The imputation function implicitly models the correlation amongst features 1. To model the uncertainty in

, we use a Bayesian multilayer perceptron that considers a probabilistic distribution

for the weights 8.We apply the imputations when the masking matrix indicates missing values. Thus, across time steps, the updated input is:

(3) |

With and the RNN hidden states , the predicted output is:

(4) |

We use a linear form for . The above equations specify the overall recurrent dynamical system. Then, to obtain the imputations and predictions given input , we need to compute the probabilistic distribution of the overall weights , given data . As the true posterior distribution is intractable in general, we have to approximate it using Bayes by Backprop 8

. Conceptually, Bayes by Backprop minimizes the Kullback-Leibler divergence between the approximate distribution

and the true posterior. As such, the loss function for our estimation not only comprises the imputation and prediction errors, but also the KL-divergence loss, as below:

(5) | |||||

(6) | |||||

(7) | |||||

(8) |

where is the prior distribution for the weights, set as a mixed Gaussian. We highlight that the imputation loss only considers performance for sample values that are *not* missing. The loss function controls the imputation, prediction and posterior distribution of the weights simultaneously.

Minimizing the loss function, we obtain the posterior of the weights as well as the imputed values and output predictions. Our proposed framework adapts jointly to the uncertainty in the imputation and prediction process, functions as a regularizer to improve robustness, and also provides distributions of the resulting estimates for further study.

## 3 Experiments and Results

Data: To evaluate our approach, we perform experiments on mortality prediction tasks using benchmark data sets from the PhysioNet 2012 Challenge 7 and the MIMIC-III collection 6; 3. These data sets comprise of multivariate time series clinical features recorded from patients in the intensive care unit (ICU). For PhysioNet, the input comprises 35 numerical features in time series samples from 4000 admissions. For MIMIC-III, the input comprises 12 numerical features in time series samples from 14,681 admissions. In both cases, each input sample comprises 48 hourly time steps and the output is in-hospital mortality. We note that both data sets are sparse, with 78% and 48% of the values missing for PhysioNet and MIMIC-III, respectively.

Performance Metrics: To enable evaluation of imputation performance, we simulate missingness at random (MAR) by eliminating 10% of the known input values 1. For this subset of data with simulated MAR, we evaluate imputation performance by computing the mean absolute error (MAE) and mean relative error (MRE). Further, we pick a test set with random subset of 20% from all data samples to evaluate the prediction performance by computing the areas under the receiver operating characteristics and the precision recall curves (AUROC, AUPRC).

Baselines:

We compare our method against three state-of-the-art RNN based imputation methods: (a) Gated Recurrent Units (GRU-D)

5; (b) Recurrent Imputation in Time Series (RITS) 1; (c) Bayesian Recurrent Neural Networks (BRNN) that refers to a Bayesian RNN 2 with all missing values imputed with zero. Unlike our method wherein the temporal decay factor only affects hidden states, the GRU-D baseline considers the decay factors both for input and hidden state dynamics. The RITS baseline considers a vanilla RNN without a Bayesian framework. Finally, the BRNN baseline helps evaluate the impact of a Bayesian approach for prediction, independently of data-driven imputation.Performance Results: Table 1 provides the performance results. Where previously reported results exist, we include them with citations. We also repeat the experiments for fair comparison, as required. We observe that our method outperforms the state-of-the-art techniques, beating the closest SOTA method by upto 2% in imputation MRE and upto 3.3% in prediction AUPRC. Our imputation performance improvement is less prominent in the MIMIC-III dataset, possibly as this dataset is less sparse than PhysioNet. We also characterized the imputation performance with increasing prevalence of simulated MAR for PhysioNet. With increasing MAR rates, our method offered increasing performance improvement over the closest SOTA method (RITS). Even with a 15% increase in simulated MAR, our approach outperformed RITS by 3.5% MRE.

Impact of Distributions: One advantage of our Bayesian framework is that it provides the ability to get a distribution of imputations and predictions. First, for each sample with simulated MAR, we used Monte Carlo iterations to obtain a distribution of the imputed values (Fig. 2

A). By visualizing the ground truth value atop the distribution (blue point), we can assess how far removed the distribution is from the ground truth. Second, we studied the variability implicit in the imputation process. For each of the simulated MAR, we obtain the variance of the distribution of imputed values

(e.g., red line). We then sort the variances of all imputations in ascending order (Fig. 2B), eliminate missing values with the highest variances, and assess the impact on imputation MAE. We note, for example, that removing missing values with values in the top 40 percentile of variances leads to lower MAE than when all missing values are considered. This suggests that there is a monotonic relation between the accuracy and variability of imputation. Third, we breakdown this accuracy vs variability relation to the individual feature level (Fig. 2C). The same trend carries through, wherein features with higher variability in imputed values (e.g., weight, height and inspired oxygen) tend to have higher MAE. These results suggest that, in real-world scenarios, when there is no ground truth, the variance can serve as a means to assess reliability of the imputed values.Dataset | Methods | MAE | MRE | AUROC | AUPRC |
---|---|---|---|---|---|

PhysioNet | GRU-D 1 | ||||

RITS 1 | |||||

BRNN | |||||

RITS | |||||

Ours | 0.282 | 0.398 | 0.866 | 0.553 | |

MIMIC-III | GRU-D | ||||

BRNN | |||||

RITS | |||||

Ours | 0.148 | 0.294 | 0.815 | 0.465 |

## 4 Discussion and Future Work

We have developed a Bayesian recurrent framework to enable missing data imputation and prediction on clinical time series data sets. Our approach improves both imputation and prediction performance and is robust to increasing MAR. Further, by providing explicit probability distributions of imputed values and output predictions, it enables assessment of variability and reliability of the imputed values. This has important implications in real-world scenarios, where ground truth is lacking. Future work will consider expansions to include categorical features, varying length time series and different missingness patterns such as NMAR (Missing Not At Random) and MCAR (Missing Completely At Random) and develop more rigorous theoretical grounding.

## Acknowledgement

The authors would like to acknowledge grant funding for Digital Health from the Science and Engineering Research Council, A*STAR, Singapore (Project No. A1818g0044).

Comments

There are no comments yet.