Cardiac events are associated with high mortality rates as well as long-lasting morbidities. They represent a significant burden to healthcare systems such as the UK’s nhs and continued efforts to improve risk prediction are of great importance as they allow preventive interventions (Pike et al., 2016). Existing clinical guidelines such as the QRISK2 (Hippisley-Cox et al., 2008) score are often used in primary care settings for 10-year cardiovascular risk for an individual. These guidelines are derived from populational studies and take into account well-established risk factors such as hypertension, cholesterol, age, smoking, and diabetes.
ehr systems have become ubiquitous, with hospitals routinely collecting data at high frequencies and large numbers of variables (Pike et al., 2016). These systems contain longitudinal patient information like demographics, laboratory results, medications and diagnoses. Such rich information may provide a patient-tailored training set for predicting near-term risk of events when using state-of-the-art ml approaches (Goldstein et al., 2017). However, ehr analysis remains challenging due to its high dimensionality, noise, heterogeneity, sparsity and systematic biases. A large number of studies exist on ml–based risk prediction approaches for patient deterioration in intensive care (RCP, 2017; Johnson et al., 2013) and population studies for primary care usage (Hippisley-Cox et al., 2008).
Various approaches have been investigated to predict disease unspecific endpoints from ehr such as length of stay, risk of re-admission or mortality. While many works attempt to predict future diagnoses from previous ones (Lipton et al., 2015) or from clinical notes, few predict the risk of cvd based on ehr data, such as stroke (Teoh, 2018) or heart failure (Choi et al., 2017). In Teoh (2018), the author used rnn to predict the onset of stroke within a year of the last admission, reporting an auc of by using temporal information of diagnoses codes and laboratory values of 8,000 patients. In a recent study, Xu et al. (2019) applied an attention mechanism on top of a rnn and evaluated their models in a multi-task learning setting for predicting the risk of stroke, mi and death within a year. Adding attention resulted only in minor improvements, with the authors reporting an average increase of auc by . Choi et al. (2017) also proposed a rnn with an attention mechanism to predict onset of heart failure. Common factors amongst these studies are i) use of limited number of features, e.g. only diagnoses and laboratory results, ii) the focus on a single prediction horizon, e.g.. the risk of having an event within a year after admission, and iii) no analysis of which features are most relevant for prediction.
In this work, we propose an algorithm to predict a patient’s first diagnosis of ischemic stroke or mi. For this purpose, a multi-task attention rnn is proposed. The method takes as input the established risk factors as well as data of multiple ehr modalities for prediction, including diagnoses, procedures, medications, encounter, demographic, laboratory and vital signs information. The main contributions of this paper are three-fold: (i) Prediction of the onset of ischemic stroke or mi for different time horizons in a single and multi-task setting; (ii) evaluation of the relevance of heterogeneous input data e.g. laboratory results and vital signs; and (iii) exploration of relevant observed events which lead to positive prediction using attention weights.
2 Material and Methods
Data was collected by the Oxford University Hospitals NHS Foundation Trust between 2014 – 2019 as part of routine care. The longitudinal secondary care ehr includes demographic information (i.e. sex, age), admission information (start/end dates, discharge method/destination, admission types - e.g. in-patient, outpatient, emergency department), ICD-10 coded diagnoses, OPCS-4 coded procedures, medications as bnf codes (prescribed both during visits and take-home), lims (e.g. blood and urine tests), digital bedside observations of vital signs (e.g. systolic blood pressure, temperature, etc) as well as demographic measurements such as bmi.
In this study, we focus on two highly prevalent cvd, namely ischemic stroke (defined by ICD-10 codes I63.X or I69.3) and myocardial infarction (MI - code I21.X or I25.2). In particular, we include patients who have at least one observation at least 2 weeks prior to the cvd event. Both cohorts are summarised in Table 1. For each cohort, we construct a case-control group of patients who had never been diagnosed with the case cvd (neither primary nor secondary reasons for admission). Controls were assigned by nearest neighbour matching on diseased patients on age, sex, and number of days containing observations.
|Time Horizon||Stroke cohort||MI cohort|
2.1 Data representation
Each patient is represented by a sequence of days in which observations are available, to which we refer as observation days. Data was aggregated into a single feature vector for each day. Continuous variables (e.g. vital signs and laboratory results) are represented by the median, median absolute deviation, count of observations and an abnormality flag (indicating if values outside physiological ranges), whereas categorical variables (i.e. medication prescription) were counted and the 300 most common diagnoses and procedure codes were sparse encoded. Additional features representing the existence or absence of co-morbid conditions were incorporated by utilising the ICD-10 definition of the Charlson’s co-morbidity score. Therefore, each patient is represented by a single 2D vector ofand , representing the number of observed days and features, respectively. Features with less than prevalence were excluded from our analysis. All features were scaled between
and populational mean imputation was used on missing data points.
2.2 Patient trajectory prediction algorithms
For the task of predicting cvd events from a sequence of containing hospital observations, the following approaches are evaluated.
2.2.1 Baseline clinical approach
The qrisk algorithm (Hippisley-Cox et al., 2008) is the UK primary care standard clinical assessment tool, which calculates 10-year risk of cardiovascular disease. The score is based on Cox proportional hazards models and trained on well known populational risk factors. Since secondary care ehr does not contain all risk factors from the qrisk model, missing coefficients were set to zero. The method focuses on a single observation, therefore the last observation of each subject before the event was used.
2.2.2 Baseline ML approach
lr was used as baseline ML method using L1 regularisation to account for the sparse feature space. To allow a fair comparison and make use of the information contained on multiple past observed days, observations from previous timesteps were concatenated as additional features. We evaluated the inclusion of 1, 50 and 100 days, here denoted as LR-1, LR-50 and LR-100.
|1 month||mtgru||0.763 0.032||0.694 0.124||0.138 0.016||0.722 0.014||0.608 0.087||0.188 0.035|
|mtattgru||0.756 0.033||0.696 0.136||0.131 0.013||0.733 0.025||0.684 0.101||0.165 0.024|
|gru||0.703 0.038||0.611 0.155||0.727 0.082||0.703 0.047||0.694 0.144||0.675 0.040|
|LR-50||0.702 0.054||0.772 0.136||0.657 0.062||0.597 0.028||0.546 0.134||0.631 0.020|
|qrisk||0.471 0.023||0.834 0.155||0.516 0.013||0.486 0.041||0.567 0.31||0.564 0.038|
|3 months||mtgru||0.773 0.010||0.707 0.061||0.355 0.043||0.734 0.025||0.698 0.069||0.371 0.018|
|mtattgru||0.759 0.016||0.734 0.100||0.326 0.027||0.732 0.025||0.683 0.091||0.363 0.017|
|gru||0.788 0.008||0.682 0.032||0.769 0.012||0.726 0.037||0.654 0.122||0.711 0.066|
|LR-50||0.740 0.024||0.678 0.094||0.693 0.020||0.679 0.032||0.699 0.092||0.641 0.034|
|qrisk||0.474 0.037||0.681 0.205||0.516 0.013||0.479 0.035||0.431 0.326||0.556 0.058|
|12 months||mtgru||0.811 0.018||0.794 0.045||0.594 0.041||0.801 0.010||0.765 0.065||0.637 0.033|
|mtattgru||0.797 0.012||0.718 0.055||0.610 0.042||0.780 0.013||0.647 0.033||0.672 0.015|
|gru||0.850 0.011||0.769 0.035||0.789 0.029||0.793 0.024||0.724 0.089||0.731 0.024|
|LR-50||0.758 0.005||0.660 0.034||0.714 0.014||0.737 0.018||0.691 0.064||0.692 0.034|
|qrisk||0.475 0.02||0.72 0.241||0.509 0.004||0.472 0.025||0.466 0.357||0.574 0.072|
|months||mtgru||0.897 0.015||0.779 0.023||0.847 0.027||0.849 0.009||0.774 0.030||0.768 0.018|
|mtattgru||0.885 0.014||0.769 0.028||0.824 0.029||0.853 0.010||0.749 0.058||0.791 0.027|
|gru||0.891 0.009||0.777 0.013||0.845 0.008||0.868 0.004||0.768 0.053||0.799 0.039|
|LR-50||0.785 0.008||0.706 0.026||0.717 0.020||0.785 0.017||0.686 0.055||0.745 0.017|
|qrisk||0.473 0.017||0.775 0.248||0.506 0.003||0.468 0.021||0.577 0.293||0.53 0.04|
Model performances for predicting myocardial infarction and ischemic stroke throughout all time horizons. Shown are average and standard deviation of auc, sen, and prec of 5-fold cross validation. Best average results per time horizon and disease are highlighted in bold.
2.2.3 Proposed Recurrent Neural Network
In order to leverage the longitudinal EHR data, we propose a mtgru rnn-based approach. In the single-task scenario, the model predicts cvd in a single time horizon based on historical information provided. We further extend the model into a mtgru prediction model such that, given the past history, the model predicts a sequence of time horizons simultaneously. Further, we use attention (Luong et al., 2015)
to assign weights to each day of a patient’s health record when predicting the target disease. The final model is referred to as mtattgru. Hyperparameter search was performed using Bayesian optimisation. Zero padding was applied to pad sequences towhich is also a hyperparameter of each model. Implementation details are available as Supplementary Material.
2.2.4 Performance evaluation
ml and clinical approaches were assessed for 1, 3, 12 and months time horizons using 5-fold cross-validation. Prediction accuracy is measured in terms of auc of the roc as well as sen, and prec. To further characterise the models, we calculated the permutation feature importance post-hoc as the change in F score after randomising each feature individually. In addition, we qualitatively assessed the attention weights of mtattgru trained models to demonstrate how the models incorporate information about previous observation days.
Table 2 lists the auc, sen and prec for all four investigated time horizons and both diseases. Due to space limitations, only the results for LR-50 are shown in Table 2. Extended results are shown in Supplementary Materials. Overall, the proposed gru models outperform lr and qrisk in all tasks. In shorter time horizons, in which data is limited (see Table 1
), both mtgru and mtattgru outperform the single-task gru. In case of estimating the risk of having mi (Figure1(a)) the mean AUC values for gru, mtgru, and mtattgru are , , and , respectively. Whereas in longer horizons gru methods perform comparably.
Next, we aim to further understand the model’s decision. Figure 2(a) shows the feature importance for the mtgru model evaluated on the mi cohort with a months prediction horizon. Features such as age, previous albumin levels, diagnoses of pleural effusion (J90) and procedure codes for 24h Holter electrocardiogram (U19.2) are deemed important. Last, to explore the relevance of past ehr information, we refer to the attention weights of mtattgru in Fig. 2(b). The model focuses on specific observation days throughout a patient’s history. Overall, the model seems to weigh more recent observations higher.
months time horizon. In (a) asterisks mark significance from zero with t-test p-value thresholds of 5% () and 1% (). At most, the top 10 features for each category is shown.
An important aspect of the proposed gru-based approaches is that they are able to incorporate the longitudinal information present in ehr. The proposed approaches outperform consistently clinical and ml baselines. While multi-task approaches leverage the availability of data in shorter time horizons, in longer time frames they are not as helpful. The addition of attention does not improve model performance. Meanwhile, the qrisk auc values are close to as expected due to the following reasons: i) the fact that some of the features are not available in ehr and; ii) the algorithm is trained for population studies and should indeed under-perform in an age- and sex-matched setup.
Population models such as the qrisk make use of features such as age, systolic blood pressure and bmi as well-known indicators of future cvd. Consequently, we expect models to use these variables. The feature importance analysis highlights potentially important predictors of future cardiac events. Reassuringly, the most important features broadly align with the current standard of care. Our results confirm the high relevance of diagnoses and procedure codes as used in previous studies (Xu et al., 2019), but also indicate the benefit of integrating further ehr modalities such as laboratory values and vital signs. This might be one reason why our GRU models achieve an AUC between for predicting the risk of mi within a 12 month time horizon, in contrast to (Xu et al., 2019) who reported AUC values of .
In addition to relevant features, attention is able to highlight past observation days which were most important in a model’s decision as shown in Fig. 2(b). It is reassuring to notice the mtattgru does not only focus on recent observations, but also on historical observations.
Although the results here presented are promising, our approach has several limitations, which we plan to address in the future. These limitations include:
Data representation: Secondary care data is highly sparse. Hence, embeddings or graph networks may help to better represent intrinsic structures of ehr data and improve predictions.
Multi-task learning: Our results indicate that the mtgru model can learn similarities across time horizons, which results in a superior prediction performance, in particular for scenarios with scarce training data. In the future, we plan to extend this approach for the task of predicting multiple diseases.
our evaluation is restricted to a single NHS trust. To increase robustness of ml models and assess generalisability, we plan to evaluate on further datasets and potentially investigate the usage of transfer learning approaches.
This contribution presents a multi-task attention rnn approach (mtattgru) for predicting mi and stroke events from the rich and longitudinal data available in ehr. The method was evaluated using multivariate information from a nhs trust. The proposed method outperforms baseline approaches and standard clinical tools for predicting cvds at different time horizons.
This work uses data provided by patients and collected by the NHS as part of their care and support. We believe using patient data is vital to improve health and care for everyone and would, thus, like to thank all those involved for their contribution. The data were extracted, anonymised, and supplied by the Trust in accordance with internal information governance review, NHS Trust information governance approval, and General Data Protection Regulation (GDPR) procedures outlined under the Strategic Research Agreement (SRA) and relative Data Sharing Agreements (DSAs) signed by the Trust and Sensyne Health plc.
This research has been conducted using the Oxford University Hospitals NHS Foundation Trust Clinical Data Warehouse, which is supported by the NIHR Oxford Biomedical Research Centre and Oxford University Hospitals NHS Foundation Trust. Special thanks to Kerrie Woods, Kinga Varnai, Oliver Freeman, Hizni Salih, Zuzana Moysova, Professor Jim Davies and Steve Harris.
- Using recurrent neural network models for early detection of heart failure onset. Journal of the American Medical Informatics Association 24 (2), pp. 361–370. Cited by: §1.
- Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. Journal of the American Medical Informatics Association 24 (1), pp. 198–208. External Links: Cited by: §1.
- Predicting cardiovascular risk in england and wales: prospective derivation and validation of qrisk2. Bmj 336 (7659), pp. 1475–1482. Cited by: §1, §1, §2.2.1.
- A new severity of illness scale using a subset of acute physiology and chronic health evaluation data elements shows comparable predictive accuracy. Critical care medicine 41 (7), pp. 1711–1718. Cited by: §1.
- Learning to Diagnose with LSTM Recurrent Neural Networks. External Links: Cited by: §1.
Effective Approaches to Attention-based Neural Machine Translation. In
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA, pp. 1412–1421. Cited by: §2.2.3.
- Improvement in cardiovascular risk prediction with electronic health records. Journal of cardiovascular translational research 9 (3), pp. 214–222. Cited by: §1, §1.
- National. early warning score 2 (news2): standardising the assessment of acute-illness severity in the nhs. Technical report Royal College of Physicians. Cited by: §1.
- Towards stroke prediction using electronic health records. BMC Medical Informatics and Decision Making 18 (1), pp. 127. Cited by: §1, 3rd item.
- Multiple mace risk prediction using multi-task recurrent neural network with attention. In 2019 IEEE International Conference on Healthcare Informatics (ICHI), pp. 2. Cited by: §1, 3rd item, §4.