Longitudinal Electronic Health Records (EHR), which thoroughly collect patient health information over time, have proven to be one of the most relevant data sources for tasks such as early prediction of anastomosis leakage (Soguero-Ruiz et al., 2016), characterization of patient health-status (Chushig-Muzo et al., 2020), and prediction of type 2 diabetes (Garcia-Carretero et al., 2020). However, many challenges have been raised when analyzing temporal EHR-based data. Such multivariate time series (MTS) can be characterized by missing values, different length and possibly dependent variables (Mikalsen et al., 2018). To deal with these issues, several methods have been proposed to exploit temporal clinical data (Mikalsen et al., 2018). Among them, we explore the potential of the time-series cluster kernel (TCK), which computes the pairwise similarities between time series with missing data. The created kernel matrix can be used for many different purposes, such as dimensionality reduction (DR) or classification.
Learning compressed representations of MTS make data analysis easier in the presence of redundant data, as well as for a high number of variables and time steps. Traditional DR algorithms are designed for vectorial data. However, in this paper, we leverage the potential of TCK to map high-dimensional into much lower-dimensional space. Towards that end, representing learning, i.e., transforming the input space to a new feature representation space by linear and non-linear approaches, are considered. The learning compressed representations of MTS can be used to identify visually patients with specific clinical characteristics. On the other hand, this new space can be considered as the input space for linear and non-linear classifiers.
The described methodology is applied in this work to identify the acquisition of antimicrobial multidrug resistance (AMR) in the Intensive Care Unit (ICU). This is a growing problem that jeopardizes seven decades of medical progress since antibiotics were first used in clinical practice (Organization and others, 2014). The misuse and overuse of antibiotics have resulted in bacteria being resistant to one or more antibiotics, no longer responding to drugs that they were initially sensitive to. The lack of antimicrobial effectiveness could increase the risk when treating infections, becoming impossible or extremely difficult to find a suitable treatment to cure them (Organization and others, 2014). This situation is even more critical in the ICU due to the delicate health condition of the patients in this unit.
As a consequence, AMR is causing a significant social and economic burden worldwide (World Health Organization, 2015)
. Antibiotic resistance is estimated to be responsible for nearly 300 million premature deaths and considerable economic losses by 2050, according to a recent study(Munita and Arias, 2016). The overall economic cost of AMR was predicted to be approximately 1.5 billion euros, with hospital expenditures accounting for 900 million (Prestinaci et al., 2015). This paper, therefore, proposes an approach to earlier identify the development and spread of AMR in the ICU. Towards that end, MTS associated with the use of antibiotics in this unit are analyzed.
The structure of this paper is as follows. Section 2 provides an overview of the data and the methods used in the paper. Section 3 presents the experimental results, whereas discussion and conclusions are included in Section 4.
2. Data and methods
The dataset used in the current study consisted of MTS extracted from the EHR of the ICU at the University Hospital of Fuenlabrada from 2004 until 2020. From 3476 patients admitted to the ICU during that period, 628 patients developed AMR. Each patient is characterized by MTS related to the family of antibiotics taken by a specific patient during his/her ICU stay, as well as the antibiotics taken by patients who shared the clinical unit during the stay of the patient to be studied. Moreover, we count the number of patients who shared the clinical unit and the number of AMR patients at a given time (24 hours slot). We also analyze if the patient has been assisted with mechanical ventilation. The family of antibiotics considered in this work are: Aminoglycosides (AMG), Antifungals (ATF), Carbapenemes (CAR), 1st generation Cephalosporins (CF1), 2nd generation Cephalosporins (CF2), 3rd generation Cephalosporins (CF3), 4th generation Cephalosporins (CF4), unclassified antibiotics (Others), Glycyclines (GCC), Glycopeptides (GLI), Lincosamides (LIN), Lipopeptides (LIP), Macrolides (MAC), Monobactamas (MON), Nitroimidazolics (NTI), Miscellaneous (OTR), Oxazolidinones (OXA), Broad-Spectrum Penicillins (PAP), Penicillins (PEN), Polypeptides (POL), Quinolones (QUI), Sulfamides (SUL) and Tetracyclines (TTC).
On average, the first multidrug resistance is detected within seven days after patient admission to ICU, similar to the average length of stay of non-AMR patients. Based on these results, we determine to be seven days the length of the longest MTS. Therefore, we fill with zero values the time observation of patients whose stays in the ICU were less than seven days. If the length of stay is longer than seven days, we consider the information corresponding to the last seven days closest to the detection of the first AMR. For non-AMR patients, and based on clinical knowledge, the patient’s admission to the ICU is the reference (see Figure 1 for details).
Hence, the dataset is represented as , where the -th patient is represented by the temporal matrix and the output , which identifies if a patient acquired (“1”) or not (“0”) an AMR during his/her stay in the ICU. The matrix modelled for the -th patient time series, each of them defined by a number of observations , as follows: ,
, with the column vectorhaving length for all and .
MTS have been analyzed in a variety of applications such as financial or health (Chatfield, 2003). From a theoretical point of view, several studies have considered a classical approach aiming to deal with MTS by extracting handcrafted features from raw data (Soguero-Ruiz et al., 2015, 2016). Others have focused on computing the pairwise learning similarities between the time series, such as dynamic time warping (DTW) (Wang et al., 2013; Mikalsen et al., 2021). However, many are not suitable for kernel methods due to not satisfying the condition of being positive semi-definite.
A method known as time series cluster kernel (TCK) is employed in this study. This method is based on ensemble learning approaches and probabilistic models known as Gaussian Models (GMMs). GMMs are fitting to a randomly chosen subset of MTS, features and time segments by considering different numbers of mixture components and random initial conditions. To estimate model parameters (time-dependent means, covariance matrix, and the variance of the attribute) when dealing with missing data, the likelihoods are multiplied with informative priors for the parameters, and maximum a posteriori expectation-maximization is considered(Marlin et al., 2012)
. After convergence, the posterior probability of each GMM is obtained. The inner products between pairs of posterior probabilities provided by each partition are summing up to build the kernel matrix, following the ensemble strategy. Therefore, given a GMM ensemble, we compute the TCK by exploiting the fact that the sum of kernels is itself a kernel. Since TCK procedure generates partitions at different resolutions that capture both local and global structures in the data, it can capture local and global relationships in the underlying data, it is robust to outliers and parameter-free. More details on the TCK are provided in(Mikalsen et al., 2018). We evaluate the potential of the learned representations (kernel) for dimensionality reduction, visualization and classification tasks.
Regarding dimensionality reduction, we focus on linear and non-linear dimensionality reduction methods to represent the embedding of the EHR MTS in the TCK space. Principal Component Analysis (PCA) is considered to explore the linear transformations(Anowar et al., 2021), whereas kernel PCA (KPCA) and autoencoders (AE) are considered as non-linear dimensionality reduction approaches. Note that AE are used to learn data representations in deep architectures, see (Vincent et al., 2008) for more details. To visualize data in two dimensions, we apply t-Distributed Stochastic Neighbor Embedding (t-SNE) (Van der Maaten and Hinton, 2008).
Regarding classification, the learning representation is used as the input to different classifiers. In this work, we apply linear (Logistic Regression, LR) and non-linear classifiers (k-nearest neighbour, k-NN; decision trees; random forest; support vector machines, SVM; nu-SVM; and multilayer perceptron, MLP). Due to space limitations, we do not describe the classifiers here, but for the interested reader, we refer to(Bishop, 2006).
All experiments were performed using Python language, and to model the AE, we used Keras.
This section aims to evaluate the effectiveness of the TCK by applying different dimensionality reduction techniques: PCA, KPCA and AE. After using these methods, the resulting learning representations are used for 2D visualization using t-SNE and for classification purposes. A summary of the process followed in this work is shown in Figure 2. The original dataset is separated into two subsets, training and test, which account for 70% and 30% of the patients, respectively (Caruana et al., 2015). The train set is balanced concerning the minority class (AMR-patients), using the remaining data in the test set (non-AMR patients). We apply the TCK to this dataset (freely available Matlab code in (Mikalsen, 2017)
), considering the maximum number of mixtures component for each Gaussian Mixture Models to be 40, and the number of randomizations for each number of components equals 30.
Dimensionality reduction and visualization.
To visually evaluate the potential of TCK as a kernel when dealing with MTS, we benchmark PCA with TCK, KPCA with TCK and AE with TCK. Note that for PCA, we decide to capture 99% of the information of the original space, ending up in 16 principal components. For KPCA, we consider a polynomial kernel, 50 principal components and a gamma value of 0.002083. These hyperparameters are tuned based on the minimum mean square error between the original and the compressed space obtained in the validation set. The same criteria are applied for AE, for which a leakyRelu activation function is used, except for the last layer, where a sigmoid is considered. The minimum mean squared error was used as the loss function. The AE is trained for 1000 epochs with an Adam optimizer and exponential learning rate decay. Several simple and deep AE are evaluating, showing that considering 712 hidden neurons and 250 neurons in the compressed space is the best architecture to identify AMR patients. Keras in Tensorflow has been used for this implementation.
The new representations spaces are considered as input to t-SNE, aiming to visualize patients in two dimensions. These visualizations are shown in Figure 3 for (a) for PCA, in Figure 3 (b) for KPCA, and in Figure 3 (c) for AE. The learning representations provides knowledge for AMR patient identification. A distinguishable cluster (colored mainly in green) is observed in Figures 3 (a), (b) and (c), composed by 157, 157 and 161 patients, respectively. The patients grouped in the cluster in Figures 3 (a) and (b), are part of the patients observed in the cluster shown in Figure 3 (c). It is important to highlight that, in this cluster, the majority (139) are patients with AMR detected in the first 48 hours of ICU stay, of whom 61.87% required mechanical ventilation, compared to 76.68% of AMR patients not within the cluster. In this line, AMR patients outside the cluster require more antibiotic treatments (see Figure 4 (a) for details). This may support that their health status is more critical. Furthermore, it can be observed that, in general, non-AMR patients take fewer antibiotics than AMR patients, except for families of antibiotics such as PEN and CF3 (see Figure 4 (a) and (b) for details).
The learned representations by PCA, KPCA and AE are used as the input of different linear and non-linear classifiers, specifically, LR, k-NN, decision tree, random forests, SVM, nu-SVM and MLP. The metrics used to measure the performance of the classifiers are accuracy, specificity, sensitivity, and area under the curve (AUC). To tune the hyperparameters, a 5-fold cross-validation strategy was considered in the training set. Results in the test set are shown in Table 1. Note that, in general, AE is the most adequate DR approach. It can also be observed that linear classifiers perform well in terms of sensitivity and AUC, whereas non-linear classifiers, such as nu-SVM, provide better accuracy and specificity results.
4. Discussion and conclusion
This work presents a promising approach for early identification of AMR patients in the ICU based on MTS recorded in EHR. The following are some of our contributions:
A time series cluster method is created to find similarity measures for MTS with missing data.
Compressed representations that preserve pairwise relationships allows clinicians to visually identify the acquisition of AMR in the ICU.
Classification results considering the learning representation as input space suggest that the proposed methodology can be used for earlier detection of AMR.
Learning compressed representations of the TCK space based on linear and non-linear approaches provides promising visualization for identifying a specific group of AMR patients who acquired the AMR during the first 48 hours of their stay in the ICU. This allows anticipating the culture results and taking isolation measures to avoid further spreading to other patients in the unit. The experimental results also provide good classification capabilities, bringing some light to the antibiotic treatment used to treat AMR patients.
The potential of deep autoencoders in this study opens the way for exploring more complex AE such as denoising or variational autoencoders (Doersch, 2016). Future work also includes the possibility of considering this problem as a multiclass classification problem rather than a binary one, aiming to distinguish between AMR detected in the first 48 hours, AMR detected later and non-AMR patients.
This work has been partly supported by the Spanish Research projects PID2019-107768RA-I00 (AAVis-BMR), PID2019-106623RB-C41 (Beyond), DTS17/00158, and Project Ref. F661 (Mapping-UCI)- by the Community of Madrid and the Rey Juan Carlos University.
- Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne). Computer Science Review 40, pp. 100378. Cited by: §2.2.
- Pattern recognition and machine learning. springer. Cited by: §2.2.
- Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. pp. 1721–1730. Cited by: §3.
- The analysis of time series: an introduction. Chapman and Hall/CRC. Cited by: §2.2.
Data-driven visual characterization of patient health-status using electronic health records and self-organizing maps. IEEE Access 8, pp. 137019–137031. Cited by: §1.
- Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908. Cited by: §4.
- Use of a k-nearest neighbors model to predict the development of type 2 diabetes within 2 years in an obese, hypertensive population. Medical & biological engineering & computing 58 (5), pp. 991–1002. Cited by: §1.
- Unsupervised pattern discovery in electronic health care data using probabilistic clustering models. In Proceedings of the 2nd ACM SIGHIT international health informatics symposium, pp. 389–398. Cited by: §2.2.
- Time series cluster kernel (tck) matlab implementation. External Links: Cited by: §3.
- Time series cluster kernel for learning similarities between multivariate time series with missing data. Pattern Recognition 76, pp. 569–581. Cited by: §1, §2.2.
- Time series cluster kernels to exploit informative missingness and incomplete label information. Pattern Recognition 115, pp. 107896. Cited by: §2.2.
- Mechanisms of antibiotic resistance. Virulence mechanisms of bacterial pathogens, pp. 481–511. Cited by: §1.
- Antimicrobial resistance global report on surveillance: 2014 summary. Technical report World Health Organization. Cited by: §1.
- Antimicrobial resistance: a global multifaceted phenomenon. Pathogens and global health 109 (7), pp. 309–318. Cited by: §1.
- Data-driven temporal prediction of surgical site infection. In AMIA Annual Symposium Proceedings, Vol. 2015, pp. 1164. Cited by: §2.2.
- Predicting colorectal surgical complications using heterogeneous clinical data and kernel methods. Journal of biomedical informatics 61, pp. 87–96. Cited by: §1, §2.2.
- Visualizing data using t-sne.. Journal of machine learning research 9 (11). Cited by: §2.2.
Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pp. 1096–1103. Cited by: §2.2.
- Experimental comparison of representation methods and distance measures for time series data. Data Mining and Knowledge Discovery 26 (2), pp. 275–309. Cited by: §2.2.
- Global Action Plan on Antimicrobial Resistance. (), pp. 28. Cited by: §1.