TRACER: A Framework for Facilitating Accurate and Interpretable Analytics for High Stakes Applications

03/24/2020 ∙ by Kaiping Zheng, et al. ∙ National University Health System National University of Singapore 15

In high stakes applications such as healthcare and finance analytics, the interpretability of predictive models is required and necessary for domain practitioners to trust the predictions. Traditional machine learning models, e.g., logistic regression (LR), are easy to interpret in nature. However, many of these models aggregate time-series data without considering the temporal correlations and variations. Therefore, their performance cannot match up to recurrent neural network (RNN) based models, which are nonetheless difficult to interpret. In this paper, we propose a general framework TRACER to facilitate accurate and interpretable predictions, with a novel model TITV devised for healthcare analytics and other high stakes applications such as financial investment and risk management. Different from LR and other existing RNN-based models, TITV is designed to capture both the time-invariant and the time-variant feature importance using a feature-wise transformation subnetwork and a self-attention subnetwork, for the feature influence shared over the entire time series and the time-related importance respectively. Healthcare analytics is adopted as a driving use case, and we note that the proposed TRACER is also applicable to other domains, e.g., fintech. We evaluate the accuracy of TRACER extensively in two real-world hospital datasets, and our doctors/clinicians further validate the interpretability of TRACER in both the patient level and the feature level. Besides, TRACER is also validated in a high stakes financial application and a critical temperature forecasting application. The experimental results confirm that TRACER facilitates both accurate and interpretable analytics for high stakes applications.



There are no comments yet.


page 16

page 24

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

footnotetext: A version of this preprint will appear in ACM SIGMOD 2020.

Database management systems (DBMS) have been extensively deployed to support OLAP-style analytics. In modern applications, an ever-increasing number of data-driven machine learning based analytics have been developed with the support of DBMS for complex analysis [10, 94, 95, 45]. Particularly, there have been growing demands of machine learning for complex high stakes applications, such as healthcare analytics, financial investment, etc. Notably, healthcare analytics is a very important and complex application, which entails data analytics on a selected cohort of patients for tasks such as diagnosis, prognosis, etc. Various healthcare analytic models have been proposed for Electronic Medical Records (EMR) that record visits of patients to the hospital with time-series medical features. In particular, neural network (NN) based models [20, 54] have been shown to improve the accuracy over traditional machine learning models significantly. An accurate analytic model can help healthcare practitioners to make effective and responsive decisions on patient management and resource allocation.

However, accuracy alone is far from satisfactory for healthcare analytics and other high stakes applications in deployment. In this paper, we shall use healthcare analytics as our driving use case. Suppose an accurate model has been trained and deployed for in-hospital mortality prediction.

Simply reporting to the doctors that “for this patient, the probability of mortality estimated by the model is

” is unacceptable since in life-and-death medical decisions, a single number without proper explanations is meaningless for doctors to take interventions. To alleviate this issue, it is essential to devise an interpretable model, explaining “why” certain decisions are made [63], e.g., why a probability of mortality is produced for a specific patient. Such an interpretable model is critical to provide medically meaningful results to the doctors and insightful advice to the practitioners of other high stakes applications as well.

Recently, various approaches have been proposed to explain the prediction results of neural network models [42], some of which particularly focus on healthcare analytics [57, 22, 80]

. For example, attention modeling has been adopted to compute the

visit importance, i.e., the importance of each visit of a patient [57] on the final prediction. Nonetheless, the visit importance only represents which visit is important and thus is not adequate to explain how and why the prediction is generated from the “important” visits. Typically, doctors consider one visit to be more important when certain important indicators, i.e., medical features, of that visit deviate far from a normal range. Therefore, the feature importance is more informative than the visit importance, and can be exploited to interpret model predictions.

However, the feature importance modeled in existing work is highly dependent on certain time periods [22, 80] without differentiating the “time-invariant” and “time-variant” feature importance. We note that the medical feature, on the one hand, has a influence on a patient shared over the entire time series, and this is captured in the time-invariant feature importance; on the other hand, the influence of the feature on a patient can also vary over time periods or visits, and this is captured in the time-variant feature importance.

For instance, we examine two representative laboratory tests, i.e., Glycated Hemoglobin (HbA1c) and Blood Urea Nitrogen (Urea), to predict the risk of patients developing acute kidney injury (AKI). We first train one single logistic regression (LR) model by aggregating each medical feature across all the seven-day data (see the NUH-AKI dataset in Section 5.1.1). The weight coefficient thus denotes the time-invariant feature importance, which reflects the influence of the corresponding feature on the final prediction. We also train seven LR models independently for the data of each day. The learned coefficients of each day are regarded as the time-variant feature importance. As shown in Figure 1111In each LR model (either time-invariant or time-variant), after training the model and obtaining the coefficients, we normalize the coefficients of all features via a Softmax function for illustration., we can see that HbA1c and Urea have different time-invariant feature importances which has been confirmed by doctors that Urea is more important in diagnosing kidney diseases [9]. Specifically, the time-variant feature importance increases over days in Urea while it fluctuates in HbA1c. The explanation from our doctors is that Urea is a key indicator of kidney dysfunction, so its importance grows approaching the AKI prediction time. The same phenomenon can be found in Estimated Glomerular Filtration Rate (eGFR) [26], which serves as a measure of assessing kidney function. In contrast, HbA1c [31] is typically used to assess the risk of developing diabetes; hence, it has relatively low time-invariant feature importance and stable time-variant feature importance in the prediction.

Figure 1: The normalized coefficients in both an LR model trained on the aggregated seven-day data (leftmost) and seven LR models trained separately. We illustrate with two representative laboratory tests HbA1c and Urea (best viewed in color).

To produce meaningful interpretation in healthcare analytics and other high stakes applications, we propose a general framework to provide accurate and inTerpRetAble Clinical dEcision suppoRt to doctors and practitioners of other domains. Specifically, we devise a novel model TITV for , which captures both Time-Invariant and Time-Variant feature importance for each sample (e.g., each patient in healthcare analytics) in two different subnetworks, with the former shared across time and the latter varying across different time periods. We adopt the feature-wise transformation mechanism, which is known as feature-wise linear modulation (FiLM[25, 69, 41] in one subnetwork to model the time-invariant feature importance. While for the time-variant feature importance, we adopt a self-attention mechanism in the other subnetwork to model the fine-grained information. Trained with these two subnetworks of TITV jointly in an end-to-end manner, can produce accurate predictions and meanwhile, meaningful explanations in analytics for high stakes applications.

The main contributions can be summarized as follows:

  • [leftmargin=*]

  • We identify the research gaps in modeling the feature importance, where existing methods are overly dependent on certain time periods without differentiating the time-invariant and the time-variant feature importance. Contrarily, we demonstrate that both time-invariant and time-variant feature importance are essential for high stakes applications such as healthcare analytics.

  • We propose a general framework to provide both accurate and interpretable decision support to doctors in healthcare analytics, and practitioners in other high stakes applications. Specifically, in , we devise a novel model TITV that takes into consideration the time-invariant feature importance via the FiLM mechanism and the time-variant feature importance via the self-attention mechanism.

  • We evaluate the effectiveness of for healthcare analytics in two real-world hospital datasets. Experimental results confirm that can produce more accurate prediction results than state-of-the-art baselines. Meanwhile, both patient-level and feature-level interpretation results based on the feature importance of have been validated by doctors to be medically meaningful, which shows that can assist doctors in clinical decision making.

  • We illustrate that is also applicable to other high stakes applications. Specifically, is further validated in a financial investment application and a temperature forecasting application.

Our proposed is integrated into GEMINI [53], an end-to-end healthcare data analytics system, to facilitate the accuracy and interpretability of healthcare analytics. The remainder of this paper is organized as follows. In Section 2, we review the related work. In Section 3, we introduce our general framework in the context of healthcare analytics. In Section 4, we elaborate the detailed design of the core component of , the TITV model. In Section 5, we analyze the experimental results of in healthcare analytics, which demonstrate its effectiveness in terms of both prediction accuracy and interpretation capability. To illustrate that is also applicable to other high stakes applications, we conduct experiments on a financial application and a temperature forecasting application in this section. We conclude in Section 6.

2 Related Work

2.1 Healthcare Analytics

Healthcare analytics is to conduct analytic tasks on patient data, which typically include diagnosis, prognosis, etc. Due to the recent advancements of DBMS [84, 73], researchers manage to achieve more optimized support for healthcare analytics in terms of both effectiveness and efficiency [15]. Through high-quality healthcare analytics, we can provide medical professionals with useful insights on both diseases and patients, hence contributing to better patient management and faster medical research advancement.

In recent years, EMR data collected from various hospitals and health organizations, becomes one of the major data sources for healthcare analytics. EMR data is heterogeneous in nature, consisting of information that ranges from patients’ social and demographic information, to structured medical features such as diagnoses, laboratory tests, and further to unstructured data such as magnetic resonance images and doctors’ notes. The structured data can be fed into directly, and the unstructured data can be converted into structured features before being input to for analytics.

The typical EMR data analytics pipeline from EMR Data Acquisition to Interpretation/Visualization is illustrated in Figure 2, where each module plays a critical role. To begin with, we collect raw EMR data, which may be quite “dirty” and thus may not be suitable for analytics directly. Therefore, we need to feed the raw EMR data into the EMR Data Integration and EMR Data Cleaning module to improve the data quality. Then the clean EMR data goes through the EMR Analytic Modeling module for analytics and the analytic results are interpreted and visualized to render the derived medical insights easy to comprehend for medical experts, researchers, patients, etc. Finally, we make use of doctors’ validation and feedback given based on the interpretation or visualization results to improve the whole EMR data analytics pipeline. This EMR data analytics pipeline is supported by GEMINI [53], a generalizable medical information analysis and integration platform, with the objective to design and implement an integrative healthcare analytic system in order to address various kinds of healthcare problems. GEMINI supports different key functionalities, such as capturing the feature-level irregularity in EMR data [97], resolving the bias in EMR data [96] and an adaptive regularization method [56]. Our proposed is integrated into GEMINI to facilitate the accuracy and interpretability of healthcare analytics.

Figure 2: EMR data analytics pipeline.

2.2 Interpretability

Interpretability, measuring the degree that the cause of a decision can be understood by human [7, 60], has drawn great attention in recent years. When designing analytic models, we are faced with a trade-off between what is predicted (e.g., if a hospitalized patient will develop AKI in two days and what is the corresponding probability) and why a decision is made (e.g., which features indicate this patient’s future development of AKI) [63]. The latter one which answers the why question lays emphasis on interpretability. In some high stakes applications with high complexity and demanding requirements, it is not adequate to only know the answer to the what question, because the answer to the why question can improve the understanding of human on the problem, make human aware of when the model succeeds/fails, and boost the trust of human on the analytic results. Hence, it is vitally essential to take interpretability into consideration when designing analytic models.

2.3 Interpretability in Healthcare

There exist some traditional machine learning methods which can achieve high interpretability [39, 83, 100, 98]; however, due to not considering the longitudinal property of EMR data, their performance may be degraded.

Deep learning [50] has recently aroused wide interest due to its ability to achieve state-of-the-art performance in a large number of applications [43, 76, 59]

. In deep learning, there is a category of models, i.e., recurrent neural networks (RNN), such as the long short-term memory (LSTM) model 


and the gated recurrent unit (GRU) model 

[19], proposed to capture the dynamic behavior in sequential data. Although effective in modeling time-series data, the lack of interpretability tends to become an obstacle for deploying RNN-based models in healthcare analytics, despite some existing attempts [36, 17]. Fortunately, attention mechanism [4] comes into being, with which we can visualize the attention weights to facilitate the interpretability of healthcare analytics.

Some researchers have applied the attention mechanism in healthcare analytics [21, 5]; nonetheless, these studies devise attention-based approaches for incorporating domain knowledge to improve prediction accuracy, rather than contributing to the interpretability of analytic results.

Some other researchers employ the attention mechanism to facilitate interpretability. In [57], an attention-based bidirectional RNN model is proposed to provide the interpretation for each visit in diagnosis prediction, but the more fine-grained interpretation for each medical feature is not supported. In [22], RETAIN is proposed to facilitate interpretability via both visit-level attention and feature-level attention. However, these two levels of attention are connected by multiplication and the feature-level attention already conveys some information of the visit-level attention. Then in [80], GRNN-HA, a model with a hierarchical two-level attention mechanism is employed, but its two levels of attention for visits and for features are not connected. Furthermore, both RETAIN and GRNN-HA only capture the time-variant feature importance which is heavily dependent on a certain visit, instead of a patient’s whole time series.

Different from existing work, we highlight the necessity of capturing both the time-invariant and the time-variant feature importance in healthcare analytics at the same time. In our proposed framework , specifically, in the TITV model, we combine the time-invariant feature importance via a FiLM mechanism and the time-variant feature importance via a self-attention mechanism to achieve this goal.

3 Framework

We propose a general framework to facilitate accurate and interpretable decision support for healthcare analytics and other high stakes applications. In this section, we take healthcare analytics as an illuminating example to illustrate the architecture overview of .

As shown in Figure 3, makes use of both the history time-series EMR data and the daily generated EMR data, and then feeds these data to the core component TITV of for analytics, in which both the time-invariant and the time-variant feature importance are captured in modeling. Based on the analytic results, supports doctor validation with accurate and interpretable clinical decisions in scenarios including real-time prediction & alert, patient-level interpretation and feature-level interpretation.

Data. incorporates two data sources: (i) the history time-series EMR data stored in the EMR database system of hospitals to obtain satisfactory performance of the analytic models; (ii) the daily generated EMR data (e.g., the data collected from a new patient to the hospital, or some new laboratory tests measured for a hospitalized patient) for inference at a regular frequency, to provide real-time predictions and hence, support more effective patient monitoring.

TITV model. The core component TITV of takes the data as input to generate predictions based on the collaboration of three modules: (1) Time-Invariant Module computes the time-invariant feature importance representation; (ii) Time-Variant Module computes the time-variant feature importance representation with the guidance of Time-Invariant Module; (iii) Prediction Module takes into account the information of both Time-Invariant Module and Time-Variant Module to generate the final predictions. The design details and interaction of these modules will be elaborated in Section 4.

Figure 3: Overview of in healthcare analytics.

Doctor Validation. With the TITV model, facilitates doctor validation with accurate and interpretable clinical decision support. We illustrate three representative scenarios with the application of hospital-acquired AKI prediction as follows.

  • [leftmargin=*]

  • Real-time Prediction & Alert for Daily Consultation. Suppose Doctor X is monitoring the chance of developing AKI for a hospitalized Patient Y on a daily basis. can help Doctor X via feeding the daily generated EMR data of Patient Y for analysis and computing the probability of Patient Y developing AKI in two days. Once the prediction (e.g., ) exceeds a predefined risk threshold (e.g., ), will send an alert to notify Doctor X that Patient Y is at risk, such that Doctor X can attend to Patient Y in time to take some preventive treatments to avoid deterioration. In this scenario, assists doctors with clinical decision making and contributes to better patient management.

  • Patient-Level Interpretation for Patient-Level Analysis. Following the example above, suppose Doctor X has suggested some treatments to Patient Y in advance and afterward, Doctor X decides to further investigate the EMR data of Patient Y, to find out why he/she has an probability of developing AKI. In such a scenario, can provide interpretation analysis of Patient Y based on the history time-series EMR data, and help identify the specific features of particular visits to the hospital that are responsible for the prediction of developing AKI in the future. With such patient-level interpretation, improves the understanding of doctors on each patient, and helps identify the underlying reasons for developing certain diseases such as AKI.

  • Feature-Level Interpretation for Medical Research. Suppose Doctor X has analyzed the EMR data of a cohort of patients, who have a relatively high probability (e.g., higher than the threshold ) of developing AKI, and has noticed these patients share certain similarities in the history time-series EMR data based on the patient-level interpretation, e.g., an increasing importance of the laboratory test “C-Reactive Protein” in recent examinations. Then Doctor X decides to further investigate the underlying pattern of this laboratory test with regard to the AKI development in the cohort for medical research. can support the needs of Doctor X with the feature-level interpretation, which depicts the feature importance changing pattern of this feature over time among all patients. In this way, assists Doctor X in understanding the characteristics of this feature in the AKI development and hence, contributes to the advancement of medical research.

4 Titv Model

We denote a time-series sample as of time windows, where the window length is flexible, e.g., one day, one hour or one visit. Specifically, for healthcare analytics, each time window contains the medical features of the patient extracted from his/her EMR data, denoted as , where is the number of medical features and . Each sample is extracted from a particular patient, while each patient may have more than one sample222The detailed extraction process will be explained in Section 5.1.1.. Each sample also has a corresponding label indicating the medical status of the patient. In this section, we formulate with the binary classification where the label is a binary value333In the experiments for healthcare analytics of Section 5, the binary value indicates whether a patient will develop AKI for hospital-acquired AKI prediction or pass away for in-hospital mortality prediction.

. We note that this formulation can be readily extended to other learning tasks such as regression by replacing the output activation function with a linear function.

We shall elaborate on the core component of , the TITV model for analytics. We first introduce the overall architecture of TITV and the three modules of TITV: Time-Invariant Module, Time-Variant Module and Prediction Module. Then we analyze the importance of each feature to the final prediction of TITV.

4.1 Titv Architecture Overview

With the input sample , we illustrate the generation of the prediction from TITV in Figure 4. Specifically, TITV is composed of three modules as follows.

Time-Invariant Module. For each sample, is first fed into a bidirectional RNN (BIRNN) [78]

to compute a hidden representation

. Then

from all time windows is averaged into a summary vector

, which flows to the FiLM generator, a unit to calculate the scaling parameter and the shifting parameter . This calculates the time-invariant feature importance for the sample whose influence is shared across time windows.

Time-Variant Module. We design a with and from Time-Invariant Module to conduct a feature-wise affine transformation over of time window . Then we compute the hidden representation with , and feed it to a unit supporting self-attention mechanism to compute the time-variant feature importance .

Prediction Module. We aggregate the time-invariant feature importance and the time-variant feature importance into an overall feature importance . Then the final context vector is obtained by summarizing the product of and for each time window . Finally, the context vector is used for the prediction of label .

We note that the integration of time-invariant feature importance and time-variant feature importance is non-trivial, and no previous study has investigated the integration of both. Specifically, computed from Time-Invariant Module will: (i) guide the modulation of the input in Time-Variant Module for calculating , (ii) integrate with in Prediction Module. We shall justify such integration experimentally in Section 5.2.2.

Figure 4: TITV with the collaboration of three modules.

4.2 FiLM-based Time-Invariant Module

We aim to model the time-invariant feature importance shared across time, where data in all the time windows are required and exploited. FiLM is an effective conditioning mechanism that applies a feature-wise affine transformation on the input, and is designed to model the feature importance [25, 69, 41]. We therefore adopt FiLM in Time-Invariant Module for computing the time-invariant feature importance, with the input EMR data as the self-conditioning information.

Specifically, with FiLM, we can obtain the feature-wise scaling parameter and shifting parameter for each sample. The scaling parameter represents the time-invariant importance of the features across the entire time range of each sample. The detailed structure of FiLM-based Time-Invariant Module is illustrated in Figure 5.

We first feed the time-series EMR data to a standard BIRNN model and obtain the hidden representations:


where refers to a bidirectional GRU model, and the hidden representation is the concatenation of the hidden states computed from both directions (i.e., forward and backward). Specifically, is obtained via a forward GRU model (from to ), and via a backward GRU model (from to ). The major advantage of BIRNN lies in its capability to capture both the forward and the backward temporal relationship of the EMR data, which is similar to the procedure that doctors analyze the history EMR data of a patient from both directions. Consequently, BIRNN provides a comprehensive representation of the time-series EMR data.

We further aggregate all the hidden representations of all the time windows into a summary vector :


Then this aggregated representation flows into FiLM to calculate the scaling parameter and the shifting parameter:


Note that and obtained in this module will also serve as auxiliary inputs to Time-Variant Module of TITV for better predictions, as they guide the modulation of the input EMR data in a feature-wise manner. Further, the scaling parameter determines the scale of each feature and thus indicates the importance of each feature. For a given sample, is shared and fixed through all the time windows. is meant to serve as the time-invariant feature importance, which is required and thus integrated into Prediction Module.

Figure 5: Time-Invariant Module of TITV.

4.3 Attention-based Time-Variant Module

We aim to differentiate the influence of different features in different time windows when modeling the time-variant feature importance. Similar tasks have been successfully supported with the adoption of the self-attention mechanism in many areas [18, 91]. We therefore introduce the self-attention mechanism to Time-Variant Module (as illustrated in Figure 6) to calculate the time-variant feature importance specific to each time window.

We first feed the time-series data into an adapted BIRNN model with the auxiliary information from Time-Invariant Module to calculate time-variant hidden representations:


where refers to the FiLM-based bidirectional GRU model, which also takes the scaling parameter and the shifting parameter from Time-Invariant Module as additional input.

Specifically, the detailed transformation of is given as follows, where the revised update gate , reset gate , temporary hidden state and final hidden state are calculated in a bidirectional GRU model in order:


where is the activation function and “” denotes the element-wise multiplication. Different from the standard bidirectional GRU model, also exploits and from Time-Invariant Module with a feature-wise affine transformation defined as:


We then employ a self-attention mechanism to compute the time-variant feature importance:


where of the time window will be fed into Prediction Module to attend to the features of for prediction.

Figure 6: Time-Variant Module of TITV.

4.4 Prediction Module

In Prediction Module, we produce the final prediction of TITV illustrated in Figure 7. After obtaining the scaling parameter as the time-invariant feature importance and the self-attention as the time-variant feature importance, we obtain the overall influence in Prediction Module:


where “” denotes the element-wise summation. Note that and are intermediate neural network outputs which should be combined by direct summation for generally better results [87, 30, 67]. Thereby, we integrate both the general time-invariant feature importance and the fine-grained time-variant feature importance together into the final feature importance representation .

We then obtain the context vector by aggregating the element-wise product of and the corresponding input at each time window :


The final predicted label of TITV is therefore:


where “

” denotes the inner-product of two vectors. Finally, the training of the whole framework is achieved via the optimization of a predefined loss function

between the prediction and the ground truth label , e.g., the cross-entropy loss function for binary classification:


Specifically, stochastic gradient descent back-propagation optimization can be employed to train

TITV’s model parameters in an end-to-end manner.

Figure 7: Prediction Module of TITV.

4.5 Feature Importance for Interpretation

In this subsection, we analyze the feature importance of to the predicted label , where denotes the value of feature at time window of a sample .

In binary classification,

corresponds to the probability of the sample to be classified as the positive class. Expanding Equation 

14 with (Equation 13) and (Equation 12), we have:


Then the feature importance of to the prediction can be derived as follows:


where , and correspond to the -th element of , and respectively. We can observe that both and directly influence the feature importance of to .

Substitute Equation 17 into Equation 16, we have:


which demonstrates that all the features contribute to the final prediction of TITV with the corresponding feature importance given by . Further, an with a positive feature importance value indicates that positively contributes to the final prediction, while a negative feature importance value the opposite. We note that also takes into account the feature interactions during this process, specifically interactions of the input features are first modeled using BIRNN in both Time-Invariant Module and Time-Variant Module, then captured in and , and finally, integrated in .

With , can provide interpretable decision support for high stakes applications. Specifically, in healthcare analytics, can help reveal the changing pattern of features over time for each patient to show the influence of the varying time-series features, i.e., support patient-level interpretation analysis. Therefore, can assist doctors to pinpoint the underlying problems of the patient. For each feature, can help identify the changing pattern over time on a cohort of patients, and further facilitate the understanding of the development of certain diseases for doctors/clinicians, i.e., support feature-level interpretation analysis.

5 Experiments

5.1 Experimental Set-Up

5.1.1 Datasets and Applications

We evaluate in two real-world longitudinal EMR datasets, the NUH-AKI dataset and the MIMIC-III dataset.

Figure 8: AKI detection criteria: (i) absolute AKI, and (ii) relative AKI.

NUH-AKI Dataset a sub-dataset extracted from all EMR data in National University Hospital in Singapore recording more than 100,000 patients’ EMR data in 2012. In this dataset, we target at hospital-acquired AKI prediction, i.e., to predict if a patient will develop AKI in a hospitalized admission. As explained by medical experts, AKI disease is defined according to the KDIGO criterion [38]. The definition of AKI is based on a laboratory test serum creatinine (SCr) and there are two AKI detection criteria, absolute AKI and relative AKI (as illustrated in Figure 8). Absolute AKI refers to the situation when the SCr value increases by over within the past hours, while relative AKI refers to the case when the SCr value increases to more than times of the lowest SCr value within the past seven days. For each hospitalized admission of a patient, both AKI detection criteria are checked in order to derive the label of this admission, and either criterion can cause the AKI label to be positive.

In the NUH-AKI dataset, each hospitalized admission is used as a sample. If a sample is positive, i.e., the patient develops AKI in this admission, we record the time when AKI is detected, trace two days back in time as “Prediction Window” which is not used and continue to trace seven more days back in time as “Feature Window” which is used to construct . Otherwise, if a sample is negative without developing AKI, then the time recorded for the latest medical feature in this admission is used to determine the Prediction Window and Feature Window. The relationship between Feature Window and Prediction Window is shown in Figure 9. In hospital-acquired AKI prediction, we utilize the time-series laboratory tests of each sample in Feature Window as input to predict if the patient will develop AKI in this admission in two days.

Figure 9: Feature Window and Prediction Window in hospital-acquired AKI prediction.

MIMIC-III Dataset [34] is a public dataset spanning from 2001 to 2012, recording EMR data for more than ICU patients admitted to the critical care units. In this MIMIC-III dataset, we conduct in-hospital mortality prediction with the time-series laboratory tests as input. Specifically, each admission corresponds to one visit of each patient to the hospital and for each admission, if the patient stays in the hospital for more than 48 hours, we use it as one sample. The mortality label is derived by checking whether the patient passes away in the hospital in this admission or not. Then the corresponding time-series is extracted from the laboratory tests of this admission.

We summarize some important statistics for both datasets in Table 1. Specifically, we divide Feature Window into a number of time windows by the window length and average the value of the same laboratory test for each time window, which is a typical way to transform EMR data for analytics [99, 16, 54]. Then we conduct feature normalization on the laboratory test value to obtain the normalized value as the input for analytics, specifically . We note that while the laboratory tests used in the experiments are numerical features, can readily deal with categorical or discrete features by transforming them into numerical features via standard preprocessing steps (e.g., sklearn.preprocessing.OneHotEncoder [81], pandas.get_dummies [66]) before feeding them as input.

Feature Number
Sample Number
Positive Sample Number
Negative Sample Number
Feature Window Length days hours
Time Window Length day 2 hours
Time Window Number
Table 1: Dataset Statistics

5.1.2 Baseline Methods

We compare

with LR, Gradient Boosting Decision Tree (GBDT), the standard BIRNN and several state-of-the-art methods including RETAIN 

[22] and variants of Dipole [57]. The details of these baselines are as follows.

  • [leftmargin=*]

  • LR takes the aggregated time-series EMR data as input for prediction. The aggregation operation calculates the average value of the same feature across the time series.

  • GBDT is an ensemble model composed of decision trees, which also takes the aggregated time-series EMR data as input.

  • BIRNN takes time-series EMR data as input and uses the BIRNN’s last hidden state for prediction.

  • RETAIN [22] is a reverse time attention model which devises a two-level attention mechanism, i.e., visit-level attention and feature-level attention to facilitate interpretability.

  • Dipole [57] is an attention-based BIRNN model which can achieve the interpretability for each visit through three different attention mechanisms as follows.

  • Dipole is Dipole with a location-based attention mechanism in which the attention weights are computed solely based on the current hidden state.

  • Dipole is Dipole with a general attention mechanism in which a matrix is used to capture the relationship between every two hidden states.

  • Dipole is Dipole with a concatenation-based attention mechanism in which the attention weights are computed from the concatenation of the current hidden state and each previous hidden state.

For the experiments, we randomly partition the samples into , and

for training, validation and test respectively. During training, for each approach (either or baselines), the hyperparameters which can achieve the best performance in the validation data are chosen and then applied to the test data for reporting experimental results. For both applications formalized as binary classification, we choose the area under the ROC curve (AUC), as well as the mean cross-entropy loss (CEL) per sample as evaluation metrics, and an accurate prediction model should have a high AUC value but a low CEL value. Then we report the AUC value and the CEL value averaged of

repeats in the test data.

Figure 10: Sensitivity analysis of on rnn_dim and film_dim in the NUH-AKI dataset.
Figure 11: Sensitivity analysis of on rnn_dim and film_dim in the MIMIC-III dataset.

Specifically, for , we conduct the sensitivity analysis on two critical hyperparameters: (i) rnn_dim, the dimension of in Time-Variant module; and (ii) film_dim, the dimension of in Time-Invariant Module. Both hyperparameters are tuned in the range . Figure 10 and Figure 11 illustrate the results of different rnn_dim and film_dim settings in both datasets. Based on the results, we adopt the best-performing hyperparameter setting rnn_dim=128, film_dim=512 in the NUH-AKI dataset and rnn_dim=512, film_dim=64 in the MIMIC-III dataset. Other hyperparameters include the learning rate , the weight decay

, and the epoch number

with early stopping.

5.1.3 Experimental Environment

The experiments are conducted in a server with 2 x Intel Xeon Silver 4114 (2.2GHz, 10 cores), 256G memory, 8 x GeForce RTX 2080 Ti. Models are implemented in PyTorch 1.3.1 with CUDA 10.2.

5.2 Prediction Results

5.2.1 Comparison with Baseline Methods

We report the experimental results of LR, GBDT, BIRNN, RETAIN, Dipole with three different attention mechanisms and our in Figure 12. In both applications, achieves the highest AUC values and the lowest CEL values, confirming that the time-invariant and the time-variant feature importance jointly contribute to the modeling of the time-series EMR data and hence, both are essential to improve the prediction performance.

Figure 12: Experimental results of LR, GBDT, BIRNN, RETAIN, Dipole, Dipole, Dipole and .

In both datasets, outperforms LR and GBDT significantly and the superiority of results from the capability of utilizing time-series data for analytics.

Compared with RETAIN, can achieve a higher AUC value and a lower CEL value with a large margin. The reason may be two-fold. First, RETAIN incorporates time-series EMR data in the reverse time order and therefore, loses the forward time-series information in the data. Second, in the TITV model of , the devised FiLM mechanism which can capture the time-invariant feature importance as general guidance to the model learning, poses a positive influence to improving the performance of .

Compared with BIRNN, illustrates better prediction performance in both datasets, in terms of AUC and CEL. As for interpretability, can explain the prediction results via the feature importance, whereas BIRNN is hard to interpret.

Compared with Dipole, exhibits an obvious advantage over Dipole in the NUH-AKI dataset, and also outperforms Dipole in the MIMIC-III dataset. From the perspective of interpretability, models the feature importance, which is more informative than the visit importance modeled by Dipole.

In summary, the proposed outperforms LR and GBDT significantly due to the capability of modeling the time-series data and outperforms RETAIN by capturing both time-invariant and time-variant feature importance. Further, achieves better interpretability and meanwhile prediction performance than BIRNN and Dipole in both datasets.

Figure 13: Experimental results of the ablation study.

5.2.2 Ablation Study

We conduct an ablation study of the TITV model in and illustrate the experimental results in Figure 13. keeps Time-Invariant Module and Prediction Module of TITV for analytics while removing Time-Variant Module to evaluate the influence of Time-Invariant Module. Similarly, uses Time-Variant Module and Prediction Module of TITV for analytics in order to demonstrate the influence of Time-Variant Module.

According to Figure 13, we observe that Time-Invariant Module and Time-Variant Module both contribute to boosting the performance of , as and both achieve a lower AUC and a higher CEL than .

However, compared with Time-Invariant Module, Time-Variant Module poses a larger influence to the performance of , as achieves a higher AUC value than in both datasets. When considering CEL, outperforms in the NUH-AKI dataset, and performs similarly compared with in the MIMIC-III dataset. This indicates the vital influence of the time-variant feature importance in these two applications.

5.2.3 Scalability Test

Figure 14: Experimental results of the scalability test.

We study the scalability of by measuring the model convergence time under different number of GPUs in both the NUH-AKI dataset and the MIMIC-III dataset. As can be observed in Figure 14, the convergence time of the NUH-AKI dataset decreases sub-linearly with respect to the number of GPUs used for training, due to the fact that the controlling444It includes the gradient aggregation among GPUs, the best checkpoint selection and saving, etc, which cannot be accelerated with more GPUs. cost becomes more obvious when the processing in each GPU decreases substantially. In contrast, when the controlling cost becomes less dominant, we can observe that yields higher training efficiency as a whole and achieves better scalability for the larger MIMIC-III dataset. The study confirms the scalability of that it can be accelerated and scaled appropriately with more GPUs depending on the data size and training requirements.

5.3 Patient-Level Interpretation

In this subsection, we report the patient-level interpretation results to demonstrate how assists doctors to understand why a certain patient develops AKI in the hospital-acquired AKI prediction or passes away in the in-hospital mortality prediction in the patient-level analysis. We first adopt the best-performing checkpoint of the TITV model in . Then for each patient, we visualize the patient-level interpretation results by plotting the Feature Importance - Time Window distribution of the features which are identified by doctors to be informative during the diagnosis process. Time Window is within Feature Window, ranging from day to days in the NUH-AKI dataset and from hours to hours in the MIMIC-III dataset respectively.

5.3.1 AKI Prediction in the NUH-AKI Dataset

Figure 15: Patient-level interpretation results of in the NUH-AKI dataset.

In the hospital-acquired AKI prediction, we show the patient-level interpretation results of for two representative patients who developed AKI after 48 hours in Figure 15, with the features involved: “Neutrophils %” (NEUP), “Ionised CA, POCT” (ICAP), “Sodium, POCT” (NP), “White Blood Cell” (WBC), “Carbon Dioxide” (CO2) and “Serum Sodium” (NA).

As shown in Figure 15 (a), in Patient1’s interpretation results provided by , we observe that NEUP feature shows increasingly higher importance along with time, and WBC is of stable importance. These suggest that Patient1 is suffering from worsening inflammation or infection which both NEUP and WBC respond to, although they exhibit different Feature Importance changing patterns. Then we find that ICAP and NP, two kinds of ionised electrolytes in the human body, have a Feature Importance increasing with time. Due to their medical functionality, such as hypocalcemia in AKI and adverse outcomes [1], and dysnatremia with kidney dysfunction [27], we presume that Patient1 is developing a worsening electrolyte imbalance along with worsening infection, and thus at high risk of progressing to AKI soon.

Then for Patient2 illustrated in Figure 15 (b), a relatively high and stable Feature Importance of WBC is observed, which indicates the presence of inflammation or infection. Besides, CO2’s (bicarbonate) Feature Importance is also on the increase, which is explained by acidosis that builds up with progressive kidney dysfunction [8], or worsening lactic acidosis with circulatory shock and end-organ injury including AKI [62]. Once again, the rising Feature Importance changing pattern of NA represents progressive NA-fluid imbalance and worsening kidney function in Patient2 [8, 27].

These findings suggest that the patient-level interpretation results of are valuable and vital for doctors to identify the underlying problems of a patient so that timely interventions can be taken for the patient.

5.3.2 Mortality Prediction in the MIMIC-III Dataset

Figure 16: Patient-level interpretation results of in the MIMIC-III dataset.

In the in-hospital mortality prediction, we probe into two representative patients who passed away with the Feature Importance changing patterns of five features illustrated in Figure 16, corresponding to “Oxygen” (O2), “pH” (PH), “Carbon Dioxide” (CO2), “Temperature” (TEMP) and “Base Excess” (BE).

Among these five, four features O2, CO2 (which in this case of the MIMIC-III dataset, reflects pCO2 in blood gas analysis), PH and BE are medically closely related in patients’ metabolism, respiratory status, and acid-base balance. These are intimately related to major organ functions and illness acuity. For instance, if a patient in the ICU is suffering from inadequate oxygenation or ventilation, which will cause a decrease of O2 or an increase in pCO2 respectively, the latter reflects respiratory acidosis. Concurrently, the decreasing BE reflects worsening metabolic acidosis which in turn suggests inadequate acid-base compensation by the deteriorating kidney function of the patient. The net result is a lower than normal PH (acidemia). When we investigate the two patients in Figure 16, we observe that the aforementioned four features tend to exhibit similar Feature Importance changing patterns, possibly due to their similar clinical functionalities in acid-base balance [14].

Furthermore, TEMP is medically shown to be highly related to mortality [12, 51]. This can be exemplified in Figure 16, in which both patients have a TEMP with a relatively large Feature Importance value along with time.

However, when we compare Patient1 in Figure 16 (a) and Patient2 in Figure 16 (b), we can find that Patient1’s mortality is more highly associated with derangements in oxygenation, ventilation, and acid-base derangements; Patient2’s mortality seems more associated with extremes of TEMP which may be the case of severe infection.

With such detailed patient-level analysis of , doctors can better understand the possible terminal processes of a deteriorating patient and recognize the ones that are more associated with mortality, so that priorities in therapeutic options can be identified in a personalized manner.

5.4 Feature-Level Interpretation

In this section, we show some feature-level interpretation results in both applications to demonstrate how functions in the feature level, e.g., helps unveil the characteristics of medical features in both applications. Hence, can provide medically meaningful insights contributing to medical research advancement. In each application, we use the best-performing checkpoint of the TITV model in to plot the distribution of Feature Importance - Time Window of each feature among all samples.

5.4.1 AKI Prediction in the NUH-AKI Dataset

Figure 17: Feature-level interpretation results of in the NUH-AKI dataset.

In the NUH-AKI dataset, we illustrate the Feature Importance generated by for six representative laboratory tests in Figure 17, corresponding to “C-Reactive Protein” (CRP), “Neutrophils” (NEU), “Serum Potassium” (K), “Serum Sodium” (NA), “Parathyroid hormone, Intact” (PTH) and “RBC, Urine” (URBC). Interesting patterns from these laboratory tests are discovered, including varying patterns (i.e., CRP, NEU, K, NA, PTH) and a stable pattern (i.e., URBC). Our clinical collaborators have concurred that the observed patterns are validated and proven to be medically meaningful. The detailed explanations for the observed patterns in Figure 17 are as follows.

Similar patterns discovered in similar features. As shown in Figure 17 (a) and (b), CRP and NEU tend to share a similar Feature Importance changing pattern along with time. This observation is validated by doctors. According to medical literature and doctors’ domain expertise, increasing CRP suggests worsening systemic inflammatory activity in response to various disease stressors including active infection, myocardial infarction, and cancers [40, 55, 23], and systemic inflammation may be directly involved in pathogenesis of AKI by altering the kidney microcirculation adversely [90, 47]. Likewise, NEU is the most abundant type of white blood cells in human blood that responds to bacterial infection, cancers, vascular diseases, and the former is an important mediator of inflammation [61, 71, 77]. Due to their similar medical functionality, CRP and NEU have a similar response pattern with a sudden increase in the latter part of the time, shown in their Feature Importance changing patterns with time.

Similarly, K and NA also exhibit similar Feature Importance - Time Window patterns as shown in Figure 17 (c) and (d). This is because both K and NA-water balance are important fluid and electrolytes in the human body which are vital to cellular metabolism and regulated by the kidneys [8]. A patient would suffer derangements in K and NA balance as kidney function deteriorates [35, 27], and these imbalances develop concurrently. Therefore, K and NA behave similarly in terms of Feature Importance changing patterns with time.

Different patterns indicate different clinical functionalities. Based on the analysis above, we can see that compared with CRP and NEU, K and NA should behave differently in terms of Feature Importance changing pattern along with time as shown in Figure 17, due to their different clinical functionalities.

Besides, PTH is the primary regulator of systemic calcium and phosphate homeostasis in humans and regulates transporters to increase excretion of filtered phosphate in the kidneys [70]. The hypocalcemia and hyperphosphatemia that develop with AKI may up-regulate PTH activity. Skeletal resistance to PTH is also observed in kidney failure [82]. These observations explain the elevated PTH observed in patients with AKI [49, 86]. The closer in time the measured PTH is to Prediction Window of AKI prediction, the more significant its feature importance will be. As shown in Figure 17 (e), PTH shows a high influence on the AKI risk, denoted as a relatively high Feature Importance which increases in significance along with time.

Furthermore, URBC (Figure 17 (f)) exerts a relatively stable influence on the AKI risk in terms of Feature Importance with time. The presence of URBC infers hematuria (blood in urine). Hematuria is commonly observed in kidney (glomerular) diseases and associated kidney dysfunction [92]. Hematuria may suggest a compromise in the glomerular filtration barrier, and its presence has been shown to be strongly associated with kidney disease progression over time [64]. Therefore, URBC may exhibit a stable Feature Importance on the AKI prediction.

5.4.2 Mortality Prediction in the MIMIC-III Dataset

In the MIMIC-III dataset, we illustrate the Feature Importance changing patterns along with time for six representative features in Figure 18, corresponding to “Serum Potassium” (K), “Serum Sodium” (NA), “Temperature” (TEMP), “Mean Corpuscular Hemoglobin Concentration” (MCHC), “Cholesterol, Pleural” (CP) and “Amylase, Urine” (AU). We examine the changing patterns and summarize our findings (which have been validated by our clinical collaborators to be medically meaningful) with the detailed medical correlation.

Figure 18: Feature-level interpretation results of in the MIMIC-III dataset.

Low Feature Importance detected for common features which are not generally highly related to mortality. For K and NA as illustrated in Figure 18 (a) and (b), they exhibit the following Feature Importance - Time Window patterns: (i) a flat curve with low Feature Importance values with some fluctuations; (ii) a noisy area with Feature Importance values dispersing over the whole area. We suggest that this phenomenon occurs due to the characteristics of K and NA balance in critically ill patients. These electrolyte disorders are very common in critically ill patients [52]; minor abnormalities may be too general to exert significant causal effect and hence, not be highly related to mortality. This helps explain the flat curve (i) in the figure. We note however that there are certain cases when such common features are related to mortality; patients would have varying severity of K and NA imbalance that persist due to the illness acuity, poor nutritional intake in worsening disease, intravenous fluids administered in huge quantity, or loss of body fluids in unique clinical situations with high gastrointestinal losses [27, 89]. These may lead to various changing patterns of Feature Importance with time, which corresponds to the noisy area (ii) in the figure.

High Feature Importance detected for common features that are generally highly related to mortality. As shown in Figure 18 (c), TEMP has a relatively large and stable Feature Importance with time, which means that TEMP is relatively highly related to mortality. This observation is medically plausible. Extremes of fever and body temperature have prognostic implications in sepsis and are among the key criteria in defining a systemic inflammatory response syndrome [11]. The alterations in TEMP could be infection-related or a host’s response to inflammatory stress of non-infectious origin [65, 29]. High fever could predispose to cardiac arrhythmias, increased oxygen demand, seizures and brain injury in patients and this portends adverse outcomes [58, 6]. Both hyperthermia and hypothermia, and even pharmacological temperature lowering, are associated with higher mortality in critically ill patients [12, 51].

Similar to TEMP, MCHC’s Feature Importance remains large with time shown in Figure 18 (d); it poses a larger influence on mortality than other appearing features. Low MCHC indicates low hemoglobin concentration per red blood cell in the circulation, and may imply lower oxygen-carrying capacity by blood cells to the tissues. This may explain the observation that lower MCHC is associated with mortality in patients with myocardial infarction in ICU [33]. In addition, patients with sepsis and critical illness often develop acidemia or hypophosphatemia, which in turn alters hemoglobin-oxygen affinity and reduces oxygen release to tissues [88]. Therefore, MCHC’s effect on mortality might relate to downstream effects on end-organ malperfusion.

Same feature’s diverging patterns indicate different patient clusters. As for CP (Figure 18 (e)) and AU (Figure 18 (f)), we observe that their corresponding Feature Importance changing patterns with time exhibit an apparent diverging phenomenon. We suppose such diverging patterns indicate the clinical functionality of both features in helping divide patients into different clusters.

Specifically, CP is examined to be of supportive clinical value in differentiating two types of pleural effusion: exudative (higher CP) and transudative [28]. Exudative pleural effusions either follow acute or chronic lung infection, or lung cancer; these conditions may relate to more adverse patient outcomes and hence the association with ICU mortality. On the other hand, low CP and transudative effusion may follow severe volume overload in the setting of critical illness and organ injury, and there are multiple studies demonstrating an association with fluid overload and increased ICU mortality [13, 68].

Similarly, the Feature Importance - Time Window pattern of AU is novel and interesting. It also diverges into two types, indicating the presence of different patient clusters. AU level is correlated to serum amylase level [85]. Serum amylase in turn is elevated in severe clinical diseases including acute pancreatitis [37], as well as non-pancreatic abdominal organ injury in trauma [46], and also elevated just with kidney dysfunction due to reduced clearance [79]. In the latter, low AU may occur despite raised serum levels, which may explain the diverging patterns illustrated in Figure 18 (f).

5.5 for Financial Analytics

We have thus far focused on evaluating in healthcare analytics. In this section, we will move on to demonstrate how to employ in other high stakes applications exemplified with financial analytics first, whose performance can be greatly improved with automated financial analytic algorithms [75]. Among various financial analytics, stock index prediction is of critical importance for the investment and trading of financial professionals [44].

We evaluate in the real-world stock index prediction of NASDAQ-100 Index. Specifically, we use the NASDAQ100 dataset [72], which collects the stock prices of 81 major corporations in NASDAQ-100 and the NASDAQ-100 Index values every minute from July 26, 2016 to December 22, 2016. This prediction is a regression task for the current NASDAQ-100 Index with recent stock prices of the 81 constituent corporations and the NASDAQ-100 Index values. Therefore, the index value of each minute is a prediction target and thus corresponds to one sample [72]. In the experiment, the time window is set to one minute and Feature Window minutes.

In Figure 19, we show the feature-level interpretation results of three representative stocks:, Inc. (AMZN), Lam Research Corporation (LRCX), and Viacom Inc. (VIAB) which are a top-ranking, mid-ranking and bottom-ranking stock respectively in NASDAQ-100 Index. Figure 19 illustrates the feature-level interpretation of all prediction samples and thus, the dispersion along y-axis of each feature indicates the variability of the corresponding feature importance in the observed time span.

We can observe in Figure 19 that Feature Importance of the three stocks is quite stable over different time windows. This is because the prediction is made based on a -minute Feature Window; therefore, the stability of stock prices is anticipated within such a short period of time.

Further, we can notice that the three stocks exhibit different Feature Importance changing patterns:

  • [leftmargin=*]

  • AMZN [2]

    , a top-ranking stock, has a high but fluctuating Feature Importance in the prediction. The findings demonstrate that AMZN has a significant influence on NASDAQ-100 Index, and the variance of such influence (either increase or decrease) is large, which reveals that AMZN is an important indicator of the index.

  • LRCX [48], a mid-ranking stock, has a medium Feature Importance across time, with moderate fluctuations among predictions. This shows that although not predominant, such stocks are important constituents of NASDAQ-100 Index. Further, the dispersion of Feature Importance among different samples varies mildly in different time periods, which shows that LRCX is a valuable indicator of the index.

  • VIAB, a bottom-ranking stock, shows a consistently low Feature Importance changing pattern with minor fluctuations. The interpretation results reveal that VIAB has a small influence on NASDAQ-100 Index. This is corroborated by a later announcement in December, 2017 by NASDAQ that VIAB is removed from the index after re-ranking [3].

In a nutshell, the interpretation results can reveal not only the importance but also the variability of the importance of each stock. The availability of such information is critical in the decision making for financial professionals in investment and risk management [74] during the management of their portfolios. The experiments confirm the capability of in providing insights for the high stakes financial application.

Figure 19: Feature-level interpretation results of in the NASDAQ100 dataset.

5.6 for Temperature Forecasting

In this section, we demonstrate how to employ in temperature forecasting, a critical application that targets at reducing power consumption and hence, improving energy efficiency [93].

We evaluate in the indoor temperature forecasting application. To be specific, we use the SML2010 dataset [93], a public dataset from UCI Machine Learning Repository [24] which is collected from SMLsystem from March 2012 to May 2012. In this application, we aim to predict the current indoor temperature as a regression task, given time-series sensor data sources as input features. In the experiment, the time window is set to minutes and Feature Window minutes.

In Figure 20, we illustrate the feature-level interpretation results of for two representative features: sun light in south facade (), and sun light in west facade (), as sun light intensity is apparently one of the key influential factors to the indoor temperature. According to the interpretation results provided by in Figure 20, we can see that and are both important, but exhibit different characteristics in terms of Feature Importance changing patterns.

  • [leftmargin=*]

  • ’s rising Feature Importance across time. We note that this SML2010 dataset is collected from March to May at CEU-UCH in Valencia, Spain [93], i.e., during the spring in the mid-latitude region where the temperature differs much between day and night, and the sun shines on the south facade mostly in the daytime, yet shines on the west facade in the evening. Therefore, can represent the real-time sun light intensity. The nearer to prediction in time, the larger influence can pose to the indoor temperature. This agrees with the rising Feature Importance of along with time calculated by .

  • ’s stable Feature Importance over time. Different from the south facade, the sun shines on the west facade only in the evening when it is relatively dark. This causes to serve as a relatively stable indicator of outdoor darkness (e.g., daytime vs. night, sunny vs. cloudy). Hence, exhibits a relatively stable Feature Importance over time, with a slight decrease approaching the prediction time. This decrease appears due to the fact that in the time windows near prediction, other features that can represent the real-time sun light intensity such as are relatively more important. These findings are well reflected by the interpretation results from .

Based on the experimental results and analysis above, is shown to help unveil the characteristics of features in this indoor temperature forecasting application and hence, provide meaningful information to the corresponding domain practitioners. As a result, we confirm the applicability of in this critical temperature forecasting application.

Figure 20: Feature-level interpretation results of in the SML2010 dataset.

6 Conclusions

Interpretability has been recognized to play an essential role in designing analytic models in high stakes applications such as healthcare analytics. Feature importance is one common way to interpret the predictions of analytic models. In this paper, we propose to capture the feature importance in two aspects: the time-invariant and the time-variant feature importance, respectively reflecting the overall influence of the feature shared across time and the time-related influence which may vary along with time. We devise TITV to model the time-invariant feature importance via feature-wise transformation, and the time-variant feature importance via self-attention. With TITV as the core component, we propose a framework to provide accurate and interpretable clinical decision support to doctors and insightful advice to practitioners of other high stakes applications as well. We first evaluate the effectiveness of in healthcare analytics by conducting extensive experiments over the NUH-AKI dataset and the MIMIC-III dataset for AKI prediction and mortality prediction. The results show that is able to provide more accurate predictions than all the baselines in both datasets. Further, the interpretation results have also been validated to be medically meaningful by the clinicians. We also evaluate in a financial application and a temperature forecasting application to demonstrate its applicability in other high stakes applications.

7 Acknowledgments

This research is supported by the National Research Foundation Singapore under its AI Singapore Programme (Award Number: AISG-GC-2019-002). Besides, this research is also supported by AI Singapore 100 Experiments (Award Number: AISG-100E-2018-007).


  • [1] F. Afshinnia, K. Belanger, P. M. Palevsky, and E. W. Young (2013) Effect of ionized serum calcium on outcomes in acute kidney injury needing renal replacement therapy: secondary analysis of the acute renal failure trial network study. Renal failure 35 (10), pp. 1310–1318. Cited by: §5.3.1.
  • [2], inc.. Note:, accessed on Feb 17, 2020 Cited by: 1st item.
  • [3] Annual changes to the nasdaq-100 index. Note:, accessed on Feb 17, 2020 Cited by: 3rd item.
  • [4] D. Bahdanau, K. Cho, and Y. Bengio (2015) Neural machine translation by jointly learning to align and translate. In ICLR, Cited by: §2.3.
  • [5] T. Bai, S. Zhang, B. L. Egleston, and S. Vucetic (2018) Interpretable representation learning for healthcare via capturing disease progression through time. In KDD, pp. 43–51. Cited by: §2.3.
  • [6] S. A. Bernard, T. W. Gray, M. D. Buist, B. M. Jones, W. Silvester, G. Gutteridge, and K. Smith (2002) Treatment of comatose survivors of out-of-hospital cardiac arrest with induced hypothermia. New England Journal of Medicine 346 (8), pp. 557–563. Cited by: §5.4.2.
  • [7] O. Biran and C. Cotton (2017) Explanation and justification in machine learning: a survey. In IJCAI-17 workshop on explainable AI (XAI), Vol. 8, pp. 1. Cited by: §2.2.
  • [8] D. A. Black (1969) The biochemistry of renal failure.. Br J Anaesth 41, pp. 264–268. Cited by: §5.3.1, §5.4.1.
  • [9] Blood urea nitrogen (bun). Note:, accessed on Feb 17, 2020 Cited by: §1.
  • [10] M. Boehm, M. Dusenberry, D. Eriksson, A. V. Evfimievski, F. M. Manshadi, N. Pansare, B. Reinwald, F. Reiss, P. Sen, A. Surve, and S. Tatikonda (2016) SystemML: declarative machine learning on spark. PVLDB 9 (13), pp. 1425–1436. Cited by: §1.
  • [11] R. C. Bone, R. A. Balk, F. B. Cerra, R. P. Dellinger, A. M. Fein, W. A. Knaus, R. M. Schein, and W. J. Sibbald (1992) Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. Chest 101 (6), pp. 1644–1655. Cited by: §5.4.2.
  • [12] D. P. Bota, F. L. Ferreira, C. Mélot, and J. L. Vincent (2004) Body temperature alterations in the critically ill. Intensive care medicine 30 (5), pp. 811–816. Cited by: §5.3.2, §5.4.2.
  • [13] J. Bouchard, S. B. Soroko, G. M. Chertow, J. Himmelfarb, T. A. Ikizler, E. P. Paganini, and R. L. Mehta (2009) Fluid accumulation, survival and recovery of kidney function in critically ill patients with acute kidney injury. Kidney international 76 (4), pp. 422–427. Cited by: §5.4.2.
  • [14] P. H. Breen (2001) Arterial blood gas and ph analysis: clinical approach and interpretation. Anesthesiology Clinics of North America 19 (4), pp. 885–906. Cited by: §5.3.2.
  • [15] L. Cao, W. Tao, S. An, J. Jin, Y. Yan, X. Liu, W. Ge, A. Sah, L. Battle, J. Sun, R. Chang, M. B. Westover, S. Madden, and M. Stonebraker (2019) Smile: A system to support machine learning on EEG data at scale. PVLDB 12 (12), pp. 2230–2241. Cited by: §2.1.
  • [16] Z. Che, D. C. Kale, W. Li, M. T. Bahadori, and Y. Liu (2015) Deep computational phenotyping. In KDD, pp. 507–516. Cited by: §5.1.1.
  • [17] Z. Che, S. Purushotham, R. G. Khemani, and Y. Liu (2015) Distilling knowledge from deep networks with applications to healthcare domain. CoRR abs/1512.03542. Cited by: §2.3.
  • [18] J. Cheng, L. Dong, and M. Lapata (2016) Long short-term memory-networks for machine reading. In EMNLP, pp. 551–561. Cited by: §4.3.
  • [19] K. Cho, B. van Merrienboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP, pp. 1724–1734. Cited by: §2.3.
  • [20] E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart, and J. Sun (2016) Doctor AI: predicting clinical events via recurrent neural networks. In MLHC, JMLR Workshop and Conference Proceedings, Vol. 56, pp. 301–318. Cited by: §1.
  • [21] E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, and J. Sun (2017) GRAM: graph-based attention model for healthcare representation learning. In KDD, pp. 787–795. Cited by: §2.3.
  • [22] E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. F. Stewart (2016) RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. In NIPS, pp. 3504–3512. Cited by: §1, §1, §2.3, 4th item, §5.1.2.
  • [23] R. D. Dolan, S. T. McSorley, P. G. Horgan, B. Laird, and D. C. McMillan (2017) The role of the systemic inflammatory response in predicting outcomes in patients with advanced inoperable cancer: systematic review and meta-analysis. Critical reviews in oncology/hematology 116, pp. 134–146. Cited by: §5.4.1.
  • [24] D. Dua and C. Graff (2017) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. External Links: Link Cited by: §5.6.
  • [25] V. Dumoulin, E. Perez, N. Schucher, F. Strub, H. d. Vries, A. Courville, and Y. Bengio (2018) Feature-wise transformations. Distill. Note: External Links: Document Cited by: §1, §4.2.
  • [26] Estimated glomerular filtration rate (egfr). Note:, accessed on Feb 17, 2020 Cited by: §1.
  • [27] X. Gao, C. Zheng, M. Liao, H. He, Y. Liu, C. Jing, F. Zeng, and Q. Chen (2019) Admission serum sodium and potassium levels predict survival among critically ill patients with acute kidney injury: a cohort study. BMC nephrology 20 (1), pp. 1–10. Cited by: §5.3.1, §5.3.1, §5.4.1, §5.4.2.
  • [28] A. Hamal, K. Yogi, N. Bam, S. Das, and R. Karn (2013) Pleural fluid cholesterol in differentiating exudative and transudative pleural effusion. Pulmonary medicine 2013. Cited by: §5.4.2.
  • [29] J. S. Hawksworth, D. Leeser, R. M. Jindal, E. Falta, D. Tadaki, and E. A. Elster (2009) New directions for induction immunosuppression strategy in solid organ transplantation. The American Journal of Surgery 197 (4), pp. 515–524. Cited by: §5.4.2.
  • [30] K. He, X. Zhang, S. Ren, and J. Sun (2016) Identity mappings in deep residual networks. In

    European conference on computer vision

    pp. 630–645. Cited by: §4.4.
  • [31] Hemoglobin a1c. Note:, accessed on Feb 17, 2020 Cited by: §1.
  • [32] S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural Computation 9 (8), pp. 1735–1780. Cited by: §2.3.
  • [33] Y. Huang and Z. Hu (2016) Lower mean corpuscular hemoglobin concentration is associated with poorer outcomes in intensive care unit admitted patients with acute myocardial infarction. Annals of translational medicine 4 (10). Cited by: §5.4.2.
  • [34] A. E. Johnson, T. J. Pollard, L. Shen, L. H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi, and R. G. Mark (2016) MIMIC-iii, a freely accessible critical care database. Scientific data 3. Cited by: §5.1.1.
  • [35] S. Jung, H. Kim, S. Park, J. H. Jhee, H. Yun, H. Kim, Y. K. Kee, C. Yoon, H. J. Oh, T. I. Chang, et al. (2016) Electrolyte and mineral disturbances in septic acute kidney injury patients undergoing continuous renal replacement therapy. Medicine 95 (36). Cited by: §5.4.1.
  • [36] A. Karpathy, J. Johnson, and F. Li (2015) Visualizing and understanding recurrent networks. CoRR abs/1506.02078. Cited by: §2.3.
  • [37] V. Keim, N. Teich, F. Fiedler, W. Hartig, G. Thiele, and J. Mössner (1998) A comparison of lipase and amylase in the diagnosis of acute pancreatitis in patients with abdominal pain.. Pancreas 16 (1), pp. 45–49. Cited by: §5.4.2.
  • [38] J. A. Kellum, N. Lameire, P. Aspelin, R. S. Barsoum, E. A. Burdmann, S. L. Goldstein, C. A. Herzog, M. Joannidis, A. Kribben, A. S. Levey, et al. (2012) Kidney disease: improving global outcomes (kdigo) acute kidney injury work group. kdigo clinical practice guideline for acute kidney injury. Kidney international supplements 2 (1), pp. 1–138. Cited by: §5.1.1.
  • [39] A. N. Kho, M. G. Hayes, L. Rasmussen-Torvik, J. A. Pacheco, W. K. Thompson, L. L. Armstrong, J. C. Denny, P. L. Peissig, A. W. Miller, W. Wei, et al. (2011) Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. Journal of the American Medical Informatics Association 19 (2), pp. 212–218. Cited by: §2.3.
  • [40] M. H. Kim, J. Y. Ahn, J. E. Song, H. Choi, H. W. Ann, J. K. Kim, J. H. Kim, Y. D. Jeon, S. B. Kim, S. J. Jeong, et al. (2015) The c-reactive protein/albumin ratio as an independent predictor of mortality in patients with severe sepsis or septic shock treated with early goal-directed therapy. PLoS One 10 (7), pp. e0132109. Cited by: §5.4.1.
  • [41] T. Kim, I. Song, and Y. Bengio (2017) Dynamic layer normalization for adaptive neural acoustic modeling in speech recognition. In INTERSPEECH, pp. 2411–2415. Cited by: §1, §4.2.
  • [42] P. W. Koh and P. Liang (2017) Understanding black-box predictions via influence functions. In ICML, Proceedings of Machine Learning Research, Vol. 70, pp. 1885–1894. Cited by: §1.
  • [43] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. In NIPS, pp. 1106–1114. Cited by: §2.3.
  • [44] B. Krollner, B. J. Vanstone, and G. R. Finnie (2010) Financial time series forecasting with machine learning techniques: a survey.. In ESANN, Cited by: §5.5.
  • [45] A. Kumar, R. McCann, J. F. Naughton, and J. M. Patel (2015) Model selection management systems: the next frontier of advanced analytics. SIGMOD Record 44 (4), pp. 17–22. Cited by: §1.
  • [46] S. Kumar, S. Sagar, A. Subramanian, V. Albert, R. M. Pandey, and N. Kapoor (2012) Evaluation of amylase and lipase levels in blunt trauma abdomen patients. Journal of emergencies, trauma, and shock 5 (2), pp. 135. Cited by: §5.4.2.
  • [47] W. Lai, Y. Tang, X. R. Huang, P. M. Tang, A. Xu, A. J. Szalai, T. Lou, and H. Y. Lan (2016) C-reactive protein promotes acute kidney injury via smad3-dependent inhibition of cdk2/cyclin e. Kidney international 90 (3), pp. 610–626. Cited by: §5.4.1.
  • [48] Lam research corporation. Note:, accessed on Feb 17, 2020 Cited by: 2nd item.
  • [49] D. E. Leaf, M. Wolf, S. S. Waikar, H. Chase, M. Christov, S. Cremers, and L. Stern (2012) FGF-23 levels in patients with aki and risk of adverse outcomes. Clinical Journal of the American Society of Nephrology 7 (8), pp. 1217–1223. Cited by: §5.4.1.
  • [50] Y. LeCun, Y. Bengio, and G. E. Hinton (2015) Deep learning. Nature 521 (7553), pp. 436–444. Cited by: §2.3.
  • [51] B. H. Lee, D. Inui, G. Y. Suh, J. Y. Kim, J. Y. Kwon, J. Park, K. Tada, K. Tanaka, K. Ietsugu, K. Uehara, et al. (2012) Association of body temperature and antipyretic treatments with mortality of critically ill patients with and without sepsis: multi-centered prospective observational study. Critical care 16 (1), pp. R33. Cited by: §5.3.2, §5.4.2.
  • [52] J. W. Lee (2010) Fluid and electrolyte disturbances in critically ill patients. Electrolytes & Blood Pressure 8 (2), pp. 72–81. Cited by: §5.4.2.
  • [53] Z. J. Ling, Q. T. Tran, J. Fan, G. C. Koh, T. Nguyen, C. S. Tan, J. W. Yip, and M. Zhang (2014) GEMINI: an integrative healthcare analytics system. Proceedings of the VLDB Endowment 7 (13), pp. 1766–1771. Cited by: §1, §2.1.
  • [54] Z. C. Lipton, D. C. Kale, C. Elkan, and R. C. Wetzel (2016) Learning to diagnose with LSTM recurrent neural networks. In ICLR (Poster), Cited by: §1, §5.1.1.
  • [55] G. Liuzzo, L. M. Biasucci, J. R. Gallimore, R. L. Grillo, A. G. Rebuzzi, M. B. Pepys, and A. Maseri (1994) The prognostic value of c-reactive protein and serum amyloid a protein in severe unstable angina. New England journal of medicine 331 (7), pp. 417–424. Cited by: §5.4.1.
  • [56] Z. Luo, S. Cai, J. Gao, M. Zhang, K. Y. Ngiam, G. Chen, and W. Lee (2018-04) Adaptive lightweight regularization tool for complex analytics. In 2018 IEEE 34th International Conference on Data Engineering (ICDE), Vol. , pp. 485–496. External Links: Document, ISSN 1063-6382 Cited by: §2.1.
  • [57] F. Ma, R. Chitta, J. Zhou, Q. You, T. Sun, and J. Gao (2017)

    Dipole: diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks

    In KDD, pp. 1903–1911. Cited by: §1, §2.3, 5th item, §5.1.2.
  • [58] C. A. Manthous, J. B. Hall, D. Olson, M. Singh, W. Chatila, A. Pohlman, R. Kushner, G. A. Schmidt, and L. Wood (1995) Effect of cooling on oxygen consumption in febrile critically ill patients.. American journal of respiratory and critical care medicine 151 (1), pp. 10–14. Cited by: §5.4.2.
  • [59] T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur (2010) Recurrent neural network based language model. In INTERSPEECH, pp. 1045–1048. Cited by: §2.3.
  • [60] T. Miller (2019)

    Explanation in artificial intelligence: insights from the social sciences

    Artif. Intell. 267, pp. 1–38. Cited by: §2.2.
  • [61] I. Mishalian, Z. Granot, and Z. G. Fridlender (2017) The diversity of circulating neutrophils in cancer. Immunobiology 222 (1), pp. 82–88. Cited by: §5.4.1.
  • [62] B. A. Mizock, J. Falk, et al. (1992) Lactic acidosis in critical illness. Crit Care Med 20 (1), pp. 80–93. Cited by: §5.3.1.
  • [63] C. Molnar et al. (2018) Interpretable machine learning: a guide for making black box models explainable. E-book at¡ https://christophm. github. io/interpretable-ml-book/¿, version dated 10. Cited by: §1, §2.2.
  • [64] J. A. Moreno, C. Yuste, E. Gutiérrez, Á. M. Sevillano, A. Rubio-Navarro, J. M. Amaro-Villalobos, M. Praga, and J. Egido (2016) Haematuria as a risk factor for chronic kidney disease progression in glomerular diseases: a review. Pediatric nephrology 31 (4), pp. 523–533. Cited by: §5.4.1.
  • [65] N. P. O’Grady, P. S. Barie, J. G. Bartlett, T. Bleck, K. Carroll, A. C. Kalil, P. Linden, D. G. Maki, D. Nierman, W. Pasculle, et al. (2008) Guidelines for evaluation of new fever in critically ill adult patients: 2008 update from the american college of critical care medicine and the infectious diseases society of america. Critical care medicine 36 (4), pp. 1330–1349. Cited by: §5.4.2.
  • [66] Pandas.get_dummies. Note:, accessed on Feb 17, 2020 Cited by: §5.1.1.
  • [67] A. Parikh, O. Täckström, D. Das, and J. Uszkoreit (2016) A decomposable attention model for natural language inference. In

    Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

    pp. 2249–2255. Cited by: §4.4.
  • [68] D. Payen, A. C. de Pont, Y. Sakr, C. Spies, K. Reinhart, and J. L. Vincent (2008) A positive fluid balance is associated with a worse outcome in patients with acute renal failure. Critical care 12 (3), pp. R74. Cited by: §5.4.2.
  • [69] E. Perez, F. Strub, H. de Vries, V. Dumoulin, and A. C. Courville (2018) FiLM: visual reasoning with a general conditioning layer. In AAAI, pp. 3942–3951. Cited by: §1, §4.2.
  • [70] M. F. Pfister, E. Lederer, J. Forgo, U. Ziegler, M. Lötscher, E. S. Quabius, J. Biber, and H. Murer (1997) Parathyroid hormone-dependent degradation of type ii na+/pi cotransporters. Journal of Biological Chemistry 272 (32), pp. 20125–20130. Cited by: §5.4.1.
  • [71] H. Qi, S. Yang, and L. Zhang (2017) Neutrophil extracellular traps and endothelial dysfunction in atherosclerosis and thrombosis. Frontiers in immunology 8, pp. 928. Cited by: §5.4.1.
  • [72] Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cottrell (2017) A dual-stage attention-based recurrent neural network for time series prediction. In IJCAI, pp. 2627–2633. Cited by: §5.5.
  • [73] A. Ratner, D. Alistarh, G. Alonso, D. G. Andersen, P. Bailis, S. Bird, N. Carlini, B. Catanzaro, E. Chung, B. Dally, J. Dean, I. S. Dhillon, A. G. Dimakis, P. Dubey, C. Elkan, G. Fursin, G. R. Ganger, L. Getoor, P. B. Gibbons, G. A. Gibson, J. E. Gonzalez, J. Gottschlich, S. Han, K. M. Hazelwood, F. Huang, M. Jaggi, K. G. Jamieson, M. I. Jordan, G. Joshi, R. Khalaf, J. Knight, J. Konecný, T. Kraska, A. Kumar, A. Kyrillidis, J. Li, S. Madden, H. B. McMahan, E. Meijer, I. Mitliagkas, R. Monga, D. G. Murray, D. S. Papailiopoulos, G. Pekhimenko, T. Rekatsinas, A. Rostamizadeh, C. Ré, C. D. Sa, H. Sedghi, S. Sen, V. Smith, A. Smola, D. Song, E. R. Sparks, I. Stoica, V. Sze, M. Udell, J. Vanschoren, S. Venkataraman, R. Vinayak, M. Weimer, A. G. Wilson, E. P. Xing, M. Zaharia, C. Zhang, and A. Talwalkar (2019) SysML: the new frontier of machine learning systems. CoRR abs/1904.03257. Cited by: §2.1.
  • [74] Risk management in finance. Note:, accessed on Feb 17, 2020 Cited by: §5.5.
  • [75] Robot analysts outwit humans on investment picks, study shows. Note:, accessed on Feb 17, 2020 Cited by: §5.5.
  • [76] T. N. Sainath, A. Mohamed, B. Kingsbury, and B. Ramabhadran (2013) Deep convolutional neural networks for LVCSR. In ICASSP, pp. 8614–8618. Cited by: §2.3.
  • [77] P. Scapini and M. A. Cassatella (2014) Social networking of human neutrophils within the immune system. Blood 124 (5), pp. 710–719. Cited by: §5.4.1.
  • [78] M. Schuster and K. K. Paliwal (1997) Bidirectional recurrent neural networks. IEEE Trans. Signal Processing 45 (11), pp. 2673–2681. Cited by: §4.1.
  • [79] T. Seno, H. Harada, K. Ochi, J. Tanaka, S. Matsumoto, R. Choudhury, T. Mizushima, K. Tsuboi, and M. Ishida (1995) Serum levels of six pancreatic enzymes as related to the degree of renal dysfunction.. American Journal of Gastroenterology 90 (11). Cited by: §5.4.2.
  • [80] Y. Sha and M. D. Wang (2017) Interpretable predictions of clinical outcomes with an attention-based recurrent neural network. In BCB, pp. 233–240. Cited by: §1, §1, §2.3.
  • [81] Sklearn.preprocessing.onehotencoder. Note:, accessed on Feb 17, 2020 Cited by: §5.1.1.
  • [82] P. J. Somerville and M. Kaye (1978) Resistance to parathyroid hormone in renal failure: role of vitamin d metabolites. Kidney international 14 (3), pp. 245–254. Cited by: §5.4.1.
  • [83] J. Sun, F. Wang, J. Hu, and S. Edabollahi (2012) Supervised patient similarity measure of heterogeneous patient records. SIGKDD Explorations 14 (1), pp. 16–24. Cited by: §2.3.
  • [84] J. Tan, T. Ghanem, M. Perron, X. Yu, M. Stonebraker, D. J. DeWitt, M. Serafini, A. Aboulnaga, and T. Kraska (2019) Choosing A cloud DBMS: architectures and tradeoffs. PVLDB 12 (12), pp. 2170–2182. Cited by: §2.1.
  • [85] K. Terui, T. Hishiki, T. Saito, T. Mitsunaga, M. Nakata, and H. Yoshida (2013) Urinary amylase/urinary creatinine ratio (uam/ucr)-a less-invasive parameter for management of hyperamylasemia. BMC pediatrics 13 (1), pp. 205. Cited by: §5.4.2.
  • [86] A. Vijayan, T. Li, A. Dusso, S. Jain, and D. W. Coyne (2015) Relationship of 1, 25 dihydroxy vitamin d levels to clinical outcomes in critically ill patients with acute kidney injury. Journal of nephrology & therapeutics 5 (1). Cited by: §5.4.1.
  • [87] J. Wang, Z. Wei, T. Zhang, and W. Zeng (2016) Deeply-fused nets. arXiv preprint arXiv:1605.07716. Cited by: §4.4.
  • [88] G. M. Watkins, A. Rabelo, L. F. Plzak, and G. F. Sheldon (1974) The left shifted oxyhemoglobin curve in sepsis: a preventable defect. Annals of surgery 180 (2), pp. 213. Cited by: §5.4.2.
  • [89] A. S. Winata, W. Jen, M. L. Teng, W. Hing, S. G. Iyer, V. Ma, and H. Chua (2019) Intravenous maintenance fluid tonicity and hyponatremia after major surgery-a cohort study. International Journal of Surgery 67, pp. 1–7. Cited by: §5.4.2.
  • [90] L. Wu, N. Gokden, and P. R. Mayeux (2007) Evidence for the role of reactive nitrogen species in polymicrobial sepsis-induced renal peritubular capillary dysfunction and tubular injury. Journal of the American Society of Nephrology 18 (6), pp. 1807–1815. Cited by: §5.4.1.
  • [91] K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio (2015) Show, attend and tell: neural image caption generation with visual attention. In ICML, JMLR Workshop and Conference Proceedings, Vol. 37, pp. 2048–2057. Cited by: §4.3.
  • [92] C. Yuste, F. Rivera, J. A. Moreno, and J. M. López-Gómez (2016) Haematuria on the spanish registry of glomerulonephritis. Scientific reports 6, pp. 19732. Cited by: §5.4.1.
  • [93] F. Zamora-Martinez, P. Romeu, P. Botella-Rocamora, and J. Pardo (2014) On-line learning of indoor temperature forecasting models towards energy efficiency. Energy and Buildings 83, pp. 162–172. Cited by: 1st item, §5.6, §5.6.
  • [94] C. Zhang, A. Kumar, and C. Ré (2014)

    Materialization optimizations for feature selection workloads

    In SIGMOD Conference, pp. 265–276. Cited by: §1.
  • [95] Y. Zhang, W. Zhang, and J. Yang (2010) I/o-efficient statistical computing with RIOT. In ICDE, pp. 1157–1160. Cited by: §1.
  • [96] K. Zheng, J. Gao, K. Y. Ngiam, B. C. Ooi, and W. L. J. Yip (2017) Resolving the bias in electronic medical records. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, New York, NY, USA, pp. 2171–2180. External Links: ISBN 9781450348874, Link, Document Cited by: §2.1.
  • [97] K. Zheng, W. Wang, J. Gao, K. Y. Ngiam, B. C. Ooi, and W. L. J. Yip (2017) Capturing feature-level irregularity in disease progression modeling. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM ’17, New York, NY, USA, pp. 1579–1588. External Links: ISBN 9781450349185, Link, Document Cited by: §2.1.
  • [98] J. Zhou, J. Liu, V. A. Narayan, and J. Ye (2012) Modeling disease progression via fused sparse group lasso. In KDD, pp. 1095–1103. Cited by: §2.3.
  • [99] J. Zhou, F. Wang, J. Hu, and J. Ye (2014) From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 135–144. Cited by: §5.1.1.
  • [100] J. Zhou, L. Yuan, J. Liu, and J. Ye (2011) A multi-task learning formulation for predicting disease progression. In KDD, pp. 814–822. Cited by: §2.3.