Sepsis is a life-threatening organ dysfunction caused by a dysregulated host response to infection. Each year, at least 1.7 million adults develop sepsis . Patients with sepsis are at considerable risk for severe complications and death; one in three hospital deaths are due to sepsis . The cornerstone of sepsis treatment is antibiotic administration and fluid resuscitation to correct hypotension. Studies have shown that the type of fluid resuscitation is correlated with mortality , and prescribing excess quantity of intravenous (IV) fluids to septic patients could be detrimental . Because of the significance of IV fluid in managing sepsis patients, it is critical to understand what type of fluid and what amount of fluid should be administered. In this study, we develop a prescriptive clinical model that deduces optimal, patient-specific IV fluid values for the treatment of sepsis.
The international community, Survival Sepsis Campaign (SSC), recommends early goal-directed therapy (EGDT) that underlines the importance of rapid volume resuscitation . Among septic patients, IV therapy is employed as a volume expander in the event of blood loss to keep the body tissue oxygenated. There are two types of volume expanders: Crystalloids and Colloids. Since colloids do not show clear benefits over crystalloids in treating critically ill patients despite their higher cost , we aim at finding the right type of crystalloids. The types of crystalloids considered in this study are: Dextrose (a synonym for glucose) 10% in water (D10W), Dextrose 5% (D5W), Dextrose 5% in Normal Saline (D5NS), 5% Dextrose in half normal saline (D5HNS), Dextrose 5% in Lactate Ringer (D5LR), Normal Saline (NS), Half Normal Saline (HNS), and Lactated Ringer (LR). Due to the complexity of patient physiology, there is no consensus on treatment strategy. The lack of treatment standards further complicates the treatment process for inexperienced junior doctors.
In hospitals, junior doctors are usually responsible for initiating the immediate treatment of patients with severe sepsis for two reasons. First, on-call systems in hospitals are designed to first contact junior doctors when the condition of the patient deteriorates (often governed by the triggering of an early warning system). Second, junior doctors are often the first to attend to patients following hospitalization. SSC guidelines recommend varying the amount of fluid infusion at different disease severity levels. The study showed that 20% of all adults receiving fluid resuscitation experience complications owing to inadequate quantity of fluid resuscitation . By the time senior doctors review the treatment, the early intervention window has passed, which is critical for sepsis treatment . Therefore, developing a clinical tool that can help junior doctors to deduce an optimal treatment is critical.
Some of the significant barriers to clinical models being accepted in the medical community are the issues of interpretability and explainability: medical practitioners expect a model to be transparent and should provide meaningful recommendations. Part of the novelty of our proposed method is that the model considers the physician’s own treatment recommendation when eliciting an optimized recommendation. Furthermore, the model can be parameterized to only deviate a small amount from the physician-provided values. Thus, our physician in the loop formulation ensures that medical practitioners can trust the provided recommendations.
The method is designed to provide recommendations that are tailored to each patient’s physiology. Patient physiology is characterised using clinical data available in Electronic Health Records (EHR). EHR data is collected throughout a patient’s hospital stay. Figure 1 illustrates a patient’s flow through a hospital system, and also highlights the contribution of this paper. Various clinical signs can be collected as the patient progresses through the hospital system. A patient with a suspected infection enters the hospital where the primary assessment of disease criticality is performed. Depending on the severity of the disease, a patient may be recommended for hospitalization. Alternatively, a patient who has been hospitalized may develop signs of infection during the inpatient stay. Following the SSC guidelines, the physician can recommend a specific amount of IV fluids. Our model combines the physician recommendation and the patient’s clinical and demographic measurements to determine the optimal quantity of the IV fluids that improves survival probability.
This study makes the following contributions:
Formulates a generalized optimization model to prescribe the optimal, patient-specific treatment.
Develops a tool to derive the optimal quantity of IV fluids for septic patients in ICUs.
Underlines the performance differences between the presence and the absence of human interactions in decision making using the proposed prescriptive model.
Proposes a feature selection procedure that integrates well with the established modeling framework.
Augments the theoretical literature on inverse classifiers.
. This framework includes two steps. First, some arbitrary classifier is trained that best predicts mortality. We employ logistic regression and deep learning classifiers in our experiments. Second, the selected model is embedded into an optimization formulation that takes patient data and physician-provided fluid resuscitation recommendations as input to derive the optimal IV fluid recommendations.
The rest of this paper is structured as follows. Section 2 reviews relevant literature, including existing IV fluid-based sepsis treatment recommendation models and applications of inverse classification to healthcare problems. Section 3 describes the proposed method in detail. Section 4 introduces the EHR data utilized in our experiments to evaluate our methodology. Section 5 presents the experimental settings and numerical results. Finally, Section 6 summarizes the contribution of this study.
2 Literature Review
Due to complex pathophysiology, the treatment of sepsis imposes many challenges. As a result, the World Health Organization passed a resolution to improve the definition, diagnosis, and treatment of sepsis . The SSC, in their guidelines, advocate for initial fluid resuscitation to treat sepsis effectively . They specified that within three hours of sepsis detection, the patient should be prescribed 30 mL/kg of IV crystalloid fluid. However, the prescribed quantity should also be administered considering the patients’ mass and other important vital signs . Due to the threat sepsis imposes to human life, many studies have shown that the fluid amount should be considered critically rather than reflexively. Chang and Holcomb (2016) study different fluid resuscitation options available for treating sepsis patients . The study discusses the basic physiology, science of intravenous fluid, and compares balance and unbalanced crystalloids. Durairaj and Schmidt (2008) indicated in their review paper that excessive fluid resuscitation can lead to adverse outcomes . The authors recommended the use of a dynamic index to determine the fluid amount. This recommendation could be achieved by utilizing optimization models to prescribe patient-specific fluid amounts.
Some researchers have applied machine learning techniques to determine optimal sepsis treatment strategies. Komorowski et al. (2018) proposed a model to derive an optimal treatment strategy for sepsis using reinforcement learning. The proposed model harvests the optimal treatment by comparing many treatment decisions. Raghu et al. (2017) extended (pre-print version) the work from  by employing the continuous state space solution approach to derive optimal sepsis treatment strategy . The authors optimize the quantity of IV fluids and vasopressors such that the treatment outcome can be improved.
An optimization-driven tool could support junior doctors in determining optimal IV fluid treatments, who are usually among the first to treat patients. Courtney et al. (2014) studied the infusion of intravenous fluids to patients by junior doctors. The study found that the majority of junior doctors do not prescribe the right amount of fluid and fail to adjust for volume by mass . The results of this study highlight the importance of a convenient, human (physician) in the loop method that can aid junior physicians in making the right IV fluid prescription.
A few studies have shown that early fluid resuscitation is associated with reduced mortality. Lee et al. (2014) studied the correlation between aggressive, optimal fluid resuscitation administered during the early stages of treatment . The study found that earlier fluid resuscitation administered in adequate quantities is associated with decreased mortality. Cohen et al. (2015) suggested that, in the absence of any specific anti-sepsis treatment, the early administration of antibiotics and optimal infusion of fluid resuscitation is critical. This study designs a framework that can alert physicians to the risk of mortality during the early stages of a septic episode and assists physicians in prescribing an optimal treatment strategy using our devised inverse classification framework.
Recommender systems attempt to find the items, services, products, etc. that will lead an individual to be most satisfied. Such methods are typically implemented in environments that contain an over-abundance of decisions, thereby leading users of the environment to a state of “information overload”. For instance, the popular movie and TV streaming website Netflix employs a recommender system to filter through the thousands of entertainment options and display those that the customer will most enjoy. Notions of “most satisified”, “most enjoyed”, etc. are expressed in terms of some numerical measure. On Netflix, for instance, users express whether or not they enjoyed a particular TV show or movie by “liking” or “disliking” it. The recommender systems leverage a user’s past “likes” and “dislikes” to recommend not-yet-watched movies and TV shows that align with the user’s preferences. Broadly speaking, recommender systems literature can be decomposed into three categories: collaborative filtering methods , which make recommendations based on user information, content-based filtering methods , which make recommendations based on content information, and hybrid filtering methods , which make recommendations based on both user and content information.
In more recent years, recommender systems methodology has been incorporated into a deep learning paradigm. Deep learning methods can be either collaborative, content-based, or hybrid-based depending on the type of information used as input. These methods are best decomposed in terms of the type of neural network architecture employed. Deep recommender systems employing convolutions, recurrent units , “vanilla” MLPs , and auto-encoders  have all been explored, as have graph-based methods [48, 3].
While recommender system methodology is important work to consider provided our problem setting, it is not best suited to such a setting since an over-abundance of decisions is not the issue. A related recommendation technology is referred to as inverse classification. Inverse classification makes use of an induced classification model to find the feature value perturbations that optimize for a particular outcome of interest (measured by the induced classifier); the feature value perturbations represent the recommendations. For instance, a model might be induced to learn the mapping from patient characteristics, such as age, blood pressure, and fluid amounts, to a disease outcome of interest, such as survive/die from sepsis (as we do in this paper). Inverse classification will then use this model to find the perturbations to a new patient’s feature values (i.e., IV fluids prescribed by the physician) that optimally minimize the probability of death due to sepsis.
Inverse classification has previously been applied to a variety of domains, including cardiovascular disease risk mitigation [10, 47, 29, 30, 31], hiring in nurseries , bankruptcy prediction and alleviation , diabetes , and student classroom performance [29, 30, 31]. Several select works use inverse classification to explain rather than prescribe [2, 32]. Methodologically speaking, past inverse classification works can be stratified by those employ constraints [2, 10, 29, 30, 35, 47] and those that do not [1, 37]. Constraints encourage solutions that are real-world feasible and personable (i.e., can be tailored to each individual’s preferences) and are therefore desirable. Furthermore, some inverse classification methods are model-specific [1, 2, 10, 35, 37, 47], relying on the use of a specific predictive model, while others are model agnostic [29, 30, 31]. If a particular model is able to accurately map the probability space of a particular problem, then a model-specific inverse classification method may be most appropriate. The best/most accurate model is rarely known apriori and is therefore why model-agnostic inverse classification methods are generally preferred.
Several studies have shown that human interactions combined with machine learning methods can positively influence decision making . Holzinger et al. (2016) performed numerical experiments to show the advantages of human interactions in solving optimization problems and concluded that human-in-the-loop can significantly improve the solution and the computation time . Duhaime (2016) advocates integrating humans and artificial intelligence to solve healthcare problems . The author claims that such models are more reliable and trustworthy due to the blend of physicians’ knowledge and artificial intelligence.
3 Proposed Methodology
In this section we disclose our proposed human in the loop method of eliciting optimal, patient-specific fluid resuscitation amounts to reduce the probability of death due to sepsis. The proposed prescriptive model first determines the best predictive model to estimate mortality probability. The selected model is then embedded in an optimization formulation to derive the optimal amount of IV fluid. We describe the selection of predictive model in Section3.1 and the optimization formulation in Section 3.2. Prior to disclosing our framework we first provide some relevant preliminary notation.
Let denote a dataset of instances (patient visits), where and . A value of indicates that a patient has died of sepsis and indicates that a patient has survived.
is the size of feature vector, which represents the number of clinical variables. These include demographic measurements, such as age, vitals, such as blood pressure, lab-based measurements, such as serum creatinine, and the physician-prescribed treatment measurements. However, only specific feature categories can actually be manipulated to affect (i.e., change) the final outcome (survive or not). For instance, one cannot change his or her age, but we can change the amounts of the various IV fluids administered to the patient to improve their probability of survival. Additionally, while certain measurement categories cannot be directly manipulated, the values of the features in these categories may depend and vary according to feature values in other categories. For instance, blood pressure may be a function of both age and the administration of certain drugs.
We first ascribe notation to these different categories. Let denote the indices in that correspond to unchangeable features, such as demographic information, denote the indices corresponding to the directly changeable features, such as IV fluids, and denote indirectly changeable features, such as vitals and lab-based measurements. Using these index sets the feature vector can be decomposed into to refer to specific feature values – i.e., . The feature sets of and are denoted using , and , respectively; these notations are used predominantly in Section 4.3 to explain feature selection.
3.1 Predicting Sepsis Outcomes
With the above preliminaries explained, an initial model that provides probabilistic estimates of sepsis outcomes is formulated as follows:
is the logistic sigmoid function applied to another functionthat takes as input. Such a formulation allows for flexibility in defining the model used to make predictions, while still ensuring that probabilistic outputs are obtained from the model. A model that outputs probabilities is important since we wish to minimize the probability of a negative outcome directly, rather than minimizing over a set of discrete outcomes such as . When a logistic regression model is used, the function can be expressed as a linear combination of learned parameters and an instance feature vector:
where and are the learned parameters. If is a more complex model, albeit still of the variety that can be trained to learn a probability mapping, like a neural network, the function can be expressed as:
where is a parameter vector for the th (i.e., output) layer, are parameter matrices associated with the hidden layers, and
are each some arbitrary, non-linear activation function (e.g., Rectified Linear Unit (ReLU), sigmoid, etc.). An activation function is a general non-linear mathematical function that transforms input in a way that is meant to mimic the firing of a neuron. The ReLu activation function, for instance, is defined as:
When either or are selected for use with , then and
are learned jointly through a gradient descent optimization process – e.g., gradient descent, projected gradient descent, stochastic gradient descent (in the case of), etc. If the selected learning algorithm does not natively provide probability estimates, however (e.g., is an SVM model), then must be learned separately from , and reduces to a specific application of Platt Scaling .
Defining and , as generally as we have, allows us to define a considerably larger hypothesis space to search across when attempting to find the optimal model to use in our inverse classification framework. The model selection process can be briefly expressed as follows:
where the validation set is evaluated on trained classifier, , based on the classification measures, . In our experiments, classification performance is measured using accuracy and the Area Under the Receiver Characteristic Curve (AUC), explained in Section 5.2. The increased search size of is of significance since the quality of our life-saving recommendations are directly tied to the quality of the model and accuracy of the probability space mapping.
3.2 Inverse Classification for Optimal Dose Prescriptions
Using the general model defined in Equation (1) we can obtain probabilistic estimates of death due to sepsis for some instance . This model will form the basis for our method of eliciting optimal, patient-specific fluid resuscitation recommendations, which we discuss in this subsection.
Our recommendation formulation is based on inverse classification, which is the process of manipulating the feature values of an instance in order to minimize the probability of an undesirable outcome. In this case, the instances are “sepsis patients” and the undesirable outcome is “death”. Therefore, our formulation will minimize the probability of death, which can be expressed:
where is a test instance whose feature values are freely manipulable by the optimization process.
The initial formulation, Equation (5), does not include real-world feasibility considerations, however. For instance, it would make little sense to allow the formulation to manipulate the feature values, such as age. Moreover, the extent of changes allowed by Equation (5) is unbounded and may produce nonsensical recommendations, such as negative fluid amounts, as a result. Finally, Equation (5) doesn’t take into account variable dependencies and interactions. As an example, blood pressure, an indirectly changeable feature, is a function of the unchangeable features , such as age, and the directly changeable features , representing IV fluids. Since we are manipulating the (i.e. IV fluids), we can expect blood pressure to also change as a result, and because one’s blood pressure has an impact on whether one lives or dies, it is critical to capture these dependencies during optimization
To account for the dependencies that exist between and , as discussed in the preceding paragraph, we propose to use a so-called indirect feature estimator (IFE). Let denote a function that takes the and features as input and provides estimates for the features, i.e.,
Using , we can account for how changes to the features affect the
features. The IFE can be any differentiable regression model and is therefore also fairly flexible. To be more concrete, we explicitly require a differentiable model because our optimization methodology relies on gradient information. In this study, we employ linear regression and neural networks. If we need to relax the assumption of differentiablility then the optimization methodology would need to be adjusted accordingly (e.g., heuristic optimization). Relaxing this assupmtion may broaden the IFE hypothesis space at the cost of optimality guarantees.
Considering that 1) the feature values are governed by and and 2) only the feature values can be manipulated, the naive inverse classification formulation of (5) can be transformed into:
where are now the decision variables (to reflect the fact that only these variables should be changed).
While Equation (7) is an improvement over Equation (5), there are still necessary considerations missing from the formulation. Namely, Equation (7) is unconstrained, which may allow the formulation to produce nonsensical recommendations, such as negative fluid resuscitation amounts. Additionally, since a key component of our method is human in the loop functionality (i.e., “doctor in the loop” functionality), we wish to restrict the extent of the manipulations made to the expert-specified feature values. To be a bit more explicit, our rationale is that a doctor’s prescribed fluid resuscitation amounts reflect a coarse-grained recommendation. Our inverse classification method will then refine this initial, doctor-specified recommendation to provide a fine-grained, precise recommendation that does not deviate “too far” from that specified by the doctor.
Therefore, we update Equation (7) by adding feasibility constraints to prevent nonsensical recommendations as follows:
where and is a budget term that controls the extent of recommendations allowed. Note that are the updated feature values and are the physician-provided feature values.
The constraint is best calibrated to each physician user in an offline setting, prior to deployment. If a physician is more experienced, the term can likely be smaller than if the physician is less experienced. A smaller term will produce smaller recommendations since the cumulative change recommended is restricted.
The last line of Equation (8) specifies that recommendations must be non-negative, but also less than or equal to one. This latter consideration is based on the assumption that all features have been normalized to a zero-one range and allows us to further encourage the process to produce real-world feasible recommendations. Figure 2 illustrates the inverse classification formulation in terms of the feature value categories. Note that the human in the loop (HITL) treatments denote the physician-provided IV fluid values.
The final formulation, Equation (8), can be optimized using project gradient descent (PGD), which is an efficient gradient-based optimization method . Therefore, the updates to the feature values at each iteration of PGD can be expressed by:
where is the projection operator that projects the input onto the feasible region , , and is a learning rate. Note that the gradient depends in part upon the estimate of the feature values. Further note that the projection done by onto the feasible region , dictated by the constraints in (9), can be achieved efficiently and always succeed (as long as the feasible region ) .
4 Data Preparation
This study extracted EHR data from MIMIC III (Medical Information Mart for Intensive Care III)  to demonstrate the performance of the proposed methodology. These clinical data are freely-available de-identified electronics health records consisting of over sixty thousand patients (61,532) hospitalized in critical care units (ICU) at the Beth Israel Deaconess Medical Center between 2001 and 2012. We selected the adult patient visits with a diagnosis of sepsis for use in our study.
4.2 Data pre-processing
Several necessary pre-processing steps were taken to curate our final dataset. Figure 3 illustrates these data preparation steps. The data, initially available in multiple tables, are merged to consolidate information about patients, clinical observations and treatments. A particular challenge when pre-processing medical data is the plethora of terms conveying the same meaning (synonyms), which must be accounted for when deriving a suitable dataset for this study. For example, respiratory rate is recorded as Respiratory Rate, Breath rate, Res. rate etc..
. For each patient visit, the clinical variables are recorded longitudinally (i.e., are measured across time). We then aggregate the longitudinal data by computing the mean of each clinical variable for each visit. This aggregation approach was also performed for IV fluids. Any missing data is imputed using multiple imputation by using the chain equation approach.
Pre-processing the data results in a total of 1122 patient visits with a diagnosis of sepsis. We extracted information about 30 features for these patient visits. The statistical summary (minimum, first quarter, second quarter or median, mean, third quarter and maximum) of each feature is included in Table 1. The mean and the median age of patients is 68 and 79 years, respectively. Among all patient visits, the outcome for 244 (22%) visits is expired. The data include a large proportion of patient visits with severe outcome because the study focuses on ICU patients.
|Clinical Variables (units)||Minimum||Q1||Median or Q2||Mean||Q3||Maximum|
|Base Excess (mEq/L)||-31.0||-5.0||-2.3||-2.7||0.0||16.5|
|Blood CO2 (mEq/L)||4.5||19.8||22.6||22.8||25.8||44.0|
|Blood Hemoglobin (g/dL)||6.0||9.1||9.9||10.1||11.0||19.5|
|Blood Urea Nitrogen (mg/dL)||1.0||17.0||29||36.01||48.2||212.5|
|Body Temperature (F)||47.4||97.5||98.1||98.1||98.8||107.2|
|Diastolic Blood Pressure (mmHg)||18.0||50.8||56.4||57.1||63.0||90.3|
|Glasgow Coma Scale Score||3.0||10.6||13.9||12.5||15.0||15.0|
|Heart Rate (/min)||46.6||78.4||88.2||89.0||98.9||137.3|
|O2 Flow (L/min)||0.3||2.2||3.4||5.1||6.3||100.0|
|Platelet count (x1000/mm3)||16.8||145.7||215.3||227.5||288.0||985.0|
|Respiratory Rate (/min)||10.7||17.7||20.3||20.5||22.9||38.1|
|Serum Creatinine (mg/dL)||0.2||0.8||1.2||1.9||2.0||141.9|
|Serum Chloride (mEq/L)||84.0||102.6||106.2||106.1||109.6||137.6|
|Serum Glucose (mg/dL)||30.3||107.9||126.7||135.4||150.8||447.7|
|Serum Magnesium (mEq/L)||1.1||1.8||2.0||2.0||2.1||18.3|
|Serum Potassium (mEq/L)||2.7||3.7||4.0||4.1||4.3||7.3|
|Serum Sodium (mEq/L)||118.3||136.9||139.2||139.3||141.8||163.1|
|Systolic Blood Pressure (mmHg)||0.0||102.2||109.9||111.7||120.7||210.1|
|WBC Count (x1000/mm3)||0.5||8.3||11.7||13.2||15.8||97.1|
|Median Weight (kg)||0.0||63.8||76.8||80.3||90.8||233.9|
Besides demographics, vitals, and lab-based data, we also extracted the physician-prescribed dosage of IV fluids. Table 2 lists the names of IV fluids, as recorded in the dataset. As mentioned earlier, a key medical data challenge is the use of synonymous terms; many IV fluids are recorded with different names, but refer to the same fluid. We translate these synonymous fluid names into a standard convention following a clinical expert’s suggestions. This study incorporates nine different types of IV fluids. Additives/solutes (e.g. potassium chloride) that were administered via an IV infusion were parsed to obtain the type of the underlying IV fluid. We assume that the absence of a specific type of IV fluid for a patient indicates that the specific type of IV fluid was not prescribed.
|IV name (in dataset)||Standardized IV fluid name||Standardized acronym|
|D10W||Dextrose 10% in Water||D10W|
|D5 1/2NS||D5 1/2NS||D5HNS|
|Potassium Chl 20 mEq / 1000 mL D5 1/2 NS||D5 1/2NS||D5HNS|
|Potassium Chl 40 mEq / 1000 mL D5 1/2 NS||D5 1/2NS||D5HNS|
|D5LR||Dextrose 5% in Lactated Ringer||D5LR|
|D5NS||Dextrose 5% in Normal Saline||D5NS|
|Iso-Osmotic Dextrose||Dextrose 5%||D5W|
|5% Dextrose||Dextrose 5%||D5W|
|Dextrose 5%||Dextrose 5%||D5W|
|D5W (EXCEL BAG)||Dextrose 5%||D5W|
|5% Dextrose (EXCEL BAG)||Dextrose 5%||D5W|
|Potassium Chl 40 mEq / 1000 mL D5W||Dextrose 5%||D5W|
|Amino Acids 4.25% W/ Dextrose 5%||DNS|
|1/2 NS||Half normal saline||HNS|
|0.45% Sodium Chloride||Half normal saline||HNS|
|Lactated Ringers||Lactated Ringers||LR|
|0.9% Sodium Chloride||Normal Saline||NS|
|0.9% Sodium Chloride (Mini Bag Plus)||Normal Saline||NS|
|NS (Mini Bag Plus)||Normal Saline||NS|
|Iso-Osmotic Sodium Chloride||Normal Saline||NS|
|Isotonic Sodium Chloride||Normal saline||NS|
|NS (Glass Bottle)||Normal Saline||NS|
|Potassium Chl 40 mEq / 1000 mL NS||Normal Saline||NS|
|Potassium Chl 20 mEq / 1000 mL NS||Normal Saline||NS|
4.3 Feature Selection
Oftentimes a model with superior predictive performance can be produced by including/excluding certain features. The process of discovering the group of features that produces this superior model is called feature selection. Because the inverse classification procedure relies on an underlying model to make life-saving fluid resuscitation recommendations, it is of paramount importance that the most accurate model be learned. Therefore, we propose to employ feature selection methodology to further improve model performance and thereby the accuracy of the probability space mapping captured by the learned model. Therefore, we propose a Classifier Subset Evaluation-based (CSE) feature selection method, disclosed by Algorithm 1. Our CSE method can be viewed as a specific implementation of the more general ClassifierSubsetEval method found in Weka .
Algorithm 1 takes as input a training set , the full feature set , an arbitrary classifier with any necessary, user-specified parameters , a classifier performance measure , such as accuracy or AUC, and , which specifies the number of consecutive non-improving iterations allowed before termination. Upon execution, the algorithm initializes , which will hold the selected features, to an empty set, , which counts the number of iterations, is initialized to zero, the termination criteria is initialized to false, is the performance of the classifier trained at the th iteration with initialized to 0, , which measures classifier improvement in terms of by adding the feature at the th iteration (i.e., ), is initialized to infinity,and , which represents the number of successive, non-improving iterations currently observed, is set to 0. From here the algorithm begins iterating until is set equal to true, the conditions for which are expressed on lines 14 and 15 and will be explained momentarily.
At each iteration, is incremented by one, a feature is randomly selected from , and is updated to exclude the selected feature () (line 3). Next (line 4), the selected feature is added to where, subsequently, a model is trained using only the features (line 5) and evaluated in terms of metric to obtain ; is then computed on line 6. On line 7, is evaluated to see if adding has improved predictive performance (this occurs when 0). If does not improve the model ( = 0), or worsens model performance ( 0), then the addition of to is undone: is re-added to (lines 8-9), and is incremented (line 10); otherwise (i.e., 0) is set to zero (lines 11-12). Line 14 specifies the termination criteria: if concurrent iterations have failed to produce an improvement or if all features have been added to . Utilizing the outlined CSE-based feature selection method, we improve the predictive performance of our model and thereby the accuracy of the probability space mapping.
Previously, we elaborated on our proposed methodology and the selected clinical dataset. In this section we illustrate the performance of our model on this clinical dataset. The experimental results are stratified into five segments. First, we experiment with our CFS-based feature selection method while searching for the best predictive model. The best model and the selected subset of features are used in subsequent experiments. Second, we conduct experiments to find the best IFE. Third, using our optimal predictive model and optimal IFE, we derive optimized treatments relative to budget constraint . Fourth, we perform a robustness check of our model to evaluate performance should physician recommendations not be available. Finally, we present the average changes recommended by our method relative to several selected budget constraint values .
All results were obtained by first randomly partitioning our dataset into training, validation, and test sets. Since only approximately 20% of the dataset instances (patient visits) belong to the positive class, we ensured that equal proportions of positive instances were allocated to each set (i.e., 20% of the instances in each training, validation, testing set are positive). Dataset sizes were selected to be 80%, 10%, and 10% for training, validation, and test sets, respectively. All and models were trained using the training set. The best type of each model (, ) was selected based on validation set performance. The testing set was reserved for evaluating our recommendations and was not used in constructing or selecting predictive models. Finally, all features are normalized to a range of using min-max scaling.
5.1 Variable Selection
As discussed in Section 3.1, we need to search across and select the best predictive model () to estimate the probability of mortality (Equation 4). Our first model-building step is to eliminate clinical features that do not contribute to the prediction, and may even detract from predictive performance. Therefore, we establish a variable selection procedure (outlined in a preceding section) that integrates well with our problem setting (and is one of the contributions of this paper).
Table 3 lists both independent (or predictors) and dependent (or response) features. The independent variables are divided into three categories: , and . In our dataset, the amount and the type of the prescribed IV fluid are under direct control of the physician; hence, such variables are in the category of directly changeable features (). The vitals (e.g., heart rate, blood pressure) and lab-based measurements (e.g., creatinine, white blood cell count) can not be manipulated directly, but can be manipulated indirectly through manipulation to the administered IV fluids (the features). Therefore, all patient vitals and lab-based measurements fall under the category of indirectly changeable features (). The dataset also includes variables such as age, gender etc. These patient attributes can not be altered. Hence, demographic features fall under the category of unchangeable features ().
Feature selection was performed using CSE, discussed in Section 4.3. CSE can be applied to each in an “online” fashion, or as a pre-processing step, where a single is selected and used to find the subset of features that will be used when searching for . We adopt the latter, pre-processing strategy to reduce experiment compute time. Our selected
was a neural network model trained for 150 epochs with a single hidden layer containing three hidden nodes. This particular parameterization ofwas selected because it is representative of the parameterizations explored during the model tuning phase, the results of which are discussed in the next subsection.
Table 3 shows the variables before and after employing the variable selection procedure. The variable selection results show that D5HNS, D5LR, D5W, LR and NS are the only fluids that affect the probability of mortality significantly. We also observe that the size of the indirect variable set has been reduced from 27 to 20 features. Discharge type is considered the outcome, or dependent feature. In the next subsection, we show the performance of the predictive model on the data with the complete set and on the data with the selected features.
|Variable category||Notation||Complete set of variables||Selected variables|
|Independent||Amount of fluid resuscitation (D10W, D5HNS, D5LR, D5NS, D5W, DNS, HNS, LR and NS)||D5HNS, D5LR, D5W, LR, NS|
|Base excess, blood CO2, blood hemoglobin, blood urea nitrogen, body temperature, diastolic blood pressure, Glasgow Coma Scale, heart rate, hematocrit, lactate, O2 flow, PaCO2, PaO2, pH, PO2, PT, PTT, platelet, respiratory rate, serum creatinine, serum chloride, serum glucose, serum magnesium, serum potassium, serum sodium, systolic blood pressure, WBC (27 variables)||Base excess, blood CO2, blood hemoglobin, blood urea nitrogen, diastolic blood pressure, Glasgow Coma Scale, heart rate, hematocrit, lactate, PaCO2, PaO2, PO2, PTT, platelet, respiratory rate, serum creatinine, serum glucose, serum sodium, systolic blood pressure, WBC (20 variables)|
|Age, gender, weight||Age, gender|
|Dependent||Discharge type (binary)||Discharge type|
5.2 Predictive Model Tuning
In Section 5.1, CFS is utilized to find the best subset of features. In this section, we employ a grid search to find the model that has the best predictive performance and therefore the best probability space mapping. We apply the grid search to two datasets: a dataset containing the full set of features and a dataset containing only those features that were selected using CFS. This will allow us to choose not only the best model, but to assess whether CFS is in fact able to produce a superior model. We limit our study to logistic regression () and neural network variants (). After determining the optimal and dataset, a grid search is performed to find the optimal IFE function . We limit this grid search to multivariate linear regression and variants of neural networks.
Accuracy and AUC are employed to assess the performance of our classification models . Accuracy is defined as the ratio obtained by dividing the number of correctly predicted instances by the total number of instances. As a metric, however, accuracy is susceptible to class imbalance, which is present in our dataset (i.e., only 20% of instances are positive). Therefore, we also adopt the AUC metric, which is insensitive to class imbalance. AUC plots the true positive rate (TPR) against the false positive rate (FPR), thus creating a curve. The area underneath this curve (called the receiver operating characterstic curve) is the AUC (area under the curve). When a model predicts only the majority class (e.g., the model always predicts the negative class), the AUC is 0.50 and is why AUC is considered insensitive to class imbalance. An AUC of 1.0 represents completely perfect predictions.
|Models||Parameters||All Features||Selected Features|
|HN||Epochs||Tr Acc||Tr AUC||Val Acc||Val AUC||Tr Acc||Tr AUC||Val Acc||Val AUC|
Table 4 presents the results of our grid search in terms of accuracy and AUC, obtained on both the training and validation sets, on the two datasets (original and feature selected) discussed earlier in this section. Note that “Tr” stands for “training set” and “Val” for “validation set”. We varied the number of training epochs from 100 to 250 for all models. We also varied the number of hidden nodes in our neural network from 3 to 10. Note that logistic regression can be viewed as a neural network with no hidden nodes and layers. The Adam optimizer was used for training all models with an initial learning rate set to . The best performing model is obtained using a neural network with three hidden nodes (AUC: 0.8792) on the “feature selection” dataset. Therefore we adopt this model and dataset for the remainder of our experiments.
Next, we execute a grid search to find the optimal IFE function . Recall that the IFE takes as input and and provides estimates for and will be used during the recommendation procedure. Figure 4 illustrates the performance of the model in the validation dataset with 250 epochs. We explored epochs ranging from 100 to 350 and found that 250 epochs produces the best results. Figures 3(a) and 3(b) show Mean Absolute Error (MAE) and Mean Square Error (MSE), respectively. We utilized MSE as a performance measure to select the best model. The results show that a neural network with ten hidden nodes produces the best model with an MSE of 0.015.
Table 5 lists the selected models that will be used in subsequent experiments.
|Neural network with three hidden nodes and sigmoid output activation|
|Neural network with ten hidden nodes with no output activation|
5.3 Human in the Loop Recommendations: Probability Improvement
Using the optimal and , discovered in the preceding subsection, we apply our human in the loop inverse classification formulation, discussed in Section 3.2, to the test set. The experiments are performed by varying the budget from 0.1 to 1 with an increment of 0.1. In these experiments the IV fluid values, , specified by each physician are cumulatively allowed to be changed by only an amount , according to our formulation in Section 3.2. Therefore, larger values of will allow larger changes to be made to the physicians recommendation.
Our method derives optimal, patient-specific IV fluid recommendations along with an estimated risk (probability) of mortality (objective function) at the varying budget levels mentioned. Figure 5 shows the average results across these different budget levels. Figure 4(a) shows the average () probability of mortality (i.e., death due to sepsis; y-axis) at each budget level (x-axis), as indicated by the blue line; the gray shading indicates one quarter of one standard deviation above and below the average (). As expected, with increase in , the probability of mortality is further reduced (on average). The average probability of mortality with no adjustment in is 0.46, while the average probability of mortality with allowed to be adjusted to the maximum (i.e., ) is 0.37.
Figure 4(b) shows relative probability improvement (y-axis) at each budget setting (x-axis), indicated by the the red line; the gray shading indicates one quarter of one standard deviation above and below the average. With increase in , the probability of mortality is further reduced (on average). We can observe that the average relative improvement in mortality, provided limited budget , is about 22%, which is significantly better than the 1.8-3.6% improvement devised by  . Therefore, the proposed model can significantly improve the chances of survival by adjusting the infusion of IV fluids to the optimal value.
5.4 Human in the Loop Recommendations: Robustness
In Section 5.3, we applied our proposed methodology to our dataset and showed the benefit in terms of average probability improvement. The proposed model uses a physician’s recommendations to determine optimal, patient-specific IV fluid dosing. However, we also wanted to investigate the robustness of our model in the absence of any physician input. We refer to the scenarios where the physician’s recommendations are incorporated as human-in-the-loop initialization, while the scenarios with no physician input are referred to as random initialization. We are use the term initialization because the input values represent the starting place for the optimization procedure. Therefore, the specified values likely have an impact on the recommendation and, consequently, the amount of probability improvement that can be extended to each patient (test instance). Therefore, we compare the results obtained using physician inputs to those obtained using random inputs. For each test instance, we randomly initialized each of the features to values in the range of . This range of values was selected to reflect a cautious initialization (i.e., small, rather than large values).
Figure 6 illustrates the results obtained using human-in-the-loop initialization and random initialization. Similar to the results presented in the preceding subsection, we show both actual and relative probability improvement at varying budget levels. Figure 5(a) show the average probability improvement results. The x-axis represents the budget and y-axis represents average probability of mortality. The green line represents human-in-the-loop initialization and the purple line represents random initialization. Clearly, integrating a “human into the loop” produces reduced mortality results, as compared to the random initialization result. The results demonstrate the importance of the “human in the loop” component of our formulation. Nevertheless, the results also show that poor (i.e., random) initializations can still be turned into recommendations that provide comparable benefits (in terms of probability improvement).
Similarly, Figure 5(b) shows relative probability improvement across varying budget levels using both types of initialization. Again, the green line represents human-in-the-loop initialization and the purple line represents random initialization. Here, we make the same observations that we do for Figure 5(a). However, we can also see that as the budget is increased, the results obtained from random initialization tends closer to those obtained from physician initialization. This observation further shows that, provided a sufficiently large budget, random initialization can come close to providing the same probabilistic improvement.
5.5 Human in the Loop Recommendations: Average Recommendations
In this section, we present the average recommendations made by our human in the loop method. While many scoring criteria [15, 18, 45] and tools [17, 20] exist to assess the risk of mortality of septic patients, there are limited studies that comprehensively assess risk, account for physician input, and provide treatment recommendations. Our study and proposed method provides all three of these benefits. Therefore, in this section we examine the recommendations produced by our method. It is impractical, however, to present individual recommendations and, as such, present average recommendations by budget value.
Figure 7 shows the average recommendation results obtained from applying our method. Each of the Figures 6(a) to 6(d) represent a different budget level (0.3, 0.5, 0.7 and 0.9). The x-axis shows each IV fluid. The y-axis represents the average recommended change to a physicians input. Therefore, deviation from a value of zero represents a recommended change. A positive (negative) value indicates that the physicians recommended IV fluid value should be increased (decreased). The results are further stratified by “predicted positive”, shown in red, and “predicted negative” shown in blue. An instance was predicted as belonging to the positive class, representing “expiration due to sepsis”, if their predicted probability of mortality was greater than 50%, and negative otherwise.
Upon inspecting Figures 6(a) through 6(d) we can clearly see that an increased budget allows greater changes to be made to a physicians suggested IV fluid values. For example, with budget of 0.3, the average suggested change for D5LR for positive and negative predicted cases is 0.15 and 0.17, respectively. While the same for budge of 0.9 is 0.47 and 0.51, respectively. These figures also provide the following insights:
On average, our model suggests increasing the intake of D5LR, D5HNS and D5W, while recommending that LR and NS be decreased. These observations are in alignment with the literature  that suggests that resuscitation using LR is associated with increased renal failure.
Resuscitation using 5% dextrose in lactate ringer is encouraged among septic patients in ICUs to improve the probability of survival.
This study proposes a clinical prescriptive model with human in the loop functionality that recommends optimal, individual-specific amounts of IV fluids for the treatment of septic patients in ICUs. The proposed methodology combines constrained optimization and machine learning techniques to arrive at optimal solutions. A key novelty of the proposed clinical model is utilization of a physician’s input to derive optimal solutions. The efficacy of the method is demonstrated using a real world medical dataset. We further validated the robustness of the proposed approach to show that our method benefits from the human in the loop component, but is also robust to poor input, which is a crucial consideration for new physicians. The results showed, under the limited budget, the optimal solution can improve the average relative probability of survival by 22%. The proposed method can potentially be embedded in an existing electronic health record system to make life-saving IV fluid recommendations. This model can also be used for training junior physicians to synthesize the appropriate treatment strategy, and prevent user error after the inclusion of additional clinical variables and prospective validation. An important limitation of the model is the non-inclusion of vasopressors and antibiotics, which are two important classes of drugs used to treat sepsis.
The authors would like to thank Dr. Rebekah Child, Ph.D., RN, Associate Professor, California State University - Northridge for initially discussing the medical importance of the problem.
-  (2010) The inverse classification problem. Journal of Computer Science and Technology 25 (May), pp. 458–468. External Links: Cited by: §2.
Understanding support vector machine classifications via a recommender system-like approach. In Proceedings of the International Conference on Data Mining, pp. 305–11. Cited by: §2.
-  (2018) Graph convolutional matrix completion. In 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Deep Learning Day (KDD DLDay), Cited by: §2.
-  (2002) Hybrid recommender systems: survey and experiments. User Modeling and User-Adapted Interaction 12 (4), pp. 331–370. Cited by: §2.
-  (2010) Mice: multivariate imputation by chained equations in r. Journal of Statistical Software, pp. 1–68. Cited by: §4.2.
-  (2018) Sepsis and septic shock. The Lancet 392 (10141), pp. 75–87. Cited by: §2.
-  Centers for disease control and prevention. https://www.cdc.gov/sepsis/datareports/index.html. accessed: 07.07.2020. Cited by: §1.
-  (2016) Choice of fluid therapy in the initial management of sepsis, severe sepsis, and septic shock. Shock (Augusta, Ga.) 46 (1), pp. 17. Cited by: §2.
-  (2017) Locally connected deep learning framework for industrial-scale recommender systems. In Proceedings of the 26th International Conference on World Wide Web Companion, pp. 769–770. Cited by: §2.
-  (2012) Individualized patient-centered lifestyle recommendations: An expert system for communicating patient specific cardiovascular risk information and prioritizing lifestyle options. Journal of Biomedical Informatics 45 (6), pp. 1164–1174. External Links: Cited by: §2.
-  (2014) Are adequate fluid challenges prescribed for severe sepsis?. International Journal of Health Care Quality Assurance. Cited by: §1, §2.
-  (2013) Surviving sepsis campaign: international guidelines for management of severe sepsis and septic shock, 2012. Intensive Care Medicine 39 (2), pp. 165–228. Cited by: §1.
-  (2016) Combining human and artificial intelligence for analyzing health data. In 2016 AAAI Spring Symposium Series, Cited by: §2.
-  (2008) Fluid therapy in resuscitated sepsis: less is more. Chest 133 (1), pp. 252–263. Cited by: §2.
-  (2001) Serial evaluation of the sofa score to predict outcome in critically ill patients. JAMA 286 (14), pp. 1754–1758. Cited by: §5.5.
-  Online appendix for “data mining: practical machine learning tools and techniques”. 4. 2016. The WEKA Workbench. Cited by: §4.3.
Utilizing time series data embedded in electronic health records to develop continuous mortality risk prediction models using hidden markov models: a sepsis case study. Statistical Methods in Medical Research, pp. 0962280220929045. Cited by: §4.2, §5.5.
-  (2018) Using statistical and machine learning methods to evaluate the prognostic accuracy of sirs and qsofa. Healthcare Informatics Research 24 (2), pp. 139–147. Cited by: §5.5.
Clinical decision support system to assess the risk of sepsis using tree augmented bayesian networks and electronic medical record data. Health Informatics Journal, pp. 1460458219852872. Cited by: §1.
-  (2015) A targeted real-time early warning score (trewscore) for septic shock. Science Translational Medicine 7 (299), pp. 299ra122–299ra122. Cited by: §5.5.
-  (2016) Towards interactive machine learning (iml): applying ant colony algorithms to solve the traveling salesman problem with the human-in-the-loop approach. In International Conference on Availability, Reliability, and Security, pp. 81–95. Cited by: §2.
-  (2016) Interactive machine learning for health informatics: when do we need the human-in-the-loop?. Brain Informatics 3 (2), pp. 119–131. Cited by: §2.
-  (1995) Hypertonic sodium resuscitation is associated with renal failure and death.. Annals of surgery 221 (5), pp. 543. Cited by: item 1.
-  (2016) MIMIC-iii, a freely accessible critical care database. Scientific Data 3, pp. 160035. Cited by: §4.1.
-  (2016) Convolutional matrix factorization for document context-aware recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems, pp. 233–240. Cited by: §2.
-  (2015) Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations (ICRL 2015), External Links: Cited by: §5.2.
Collaborative recurrent neural networks for dynamic recommender systems. In Journal of Machine Learning Research: Workshop and Conference Proceedings, Vol. 63. Cited by: §2.
-  (2018) The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature Medicine 24 (11), pp. 1716–1720. Cited by: §2, §4.2.
-  Generalized inverse classification. Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 162–170. External Links: Cited by: §1, §2.
-  (2017) A budget-constrained inverse classification framework for smooth classifiers. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on, pp. 1184–1193. External Links: Cited by: §1, §2, §3.2.
-  (2018) Optimizing outcomes via inverse classification. Cited by: §2.
-  (2018) Comparison-based inverse classification for interpretability in machine learning. In International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 100–111. Cited by: §2.
-  (2014) Increased fluid administration in the first three hours of sepsis resuscitation is associated with reduced mortality: a retrospective cohort study. Chest 146 (4), pp. 908–915. Cited by: §2.
-  (2011) Content-based recommender systems: state of the art and trends. In Recommender Systems Handbook, pp. 73–105. Cited by: §2.
-  (2000) The cost minimizing inverse classification problem : A algorithm approach. Decision Support Systems 29 (3), pp. 283–300. External Links: Cited by: §2.
-  (2013) Gradient methods for minimizing composite objective function. Mathematical Programming, Series B 140, pp. 125–161. Cited by: §3.2.
-  (2002) A potential use of data envelopment analysis for the inverse classification problem. Omega 30 (3), pp. 243–248. Cited by: §2.
-  (2012) Colloids versus crystalloids for fluid resuscitation in critically ill patients. Cochrane Database of Systematic Reviews (6). Cited by: §1.
-  (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers 10 (3), pp. 61–74. Cited by: §3.1.
-  (2017) Continuous state-space models for optimal sepsis treatment-a deep reinforcement learning approach. arXiv preprint arXiv:1705.08422. Cited by: §2, §5.3.
-  (2017) Recognizing sepsis as a global health priority—a who resolution. New England Journal of Medicine 377 (5), pp. 414–417. Cited by: §2.
-  (2017) Surviving sepsis campaign: international guidelines for management of sepsis and septic shock: 2016. Intensive Care Medicine 43 (3), pp. 304–377. Cited by: §2.
-  (2014) Fluid resuscitation in sepsis: a systematic review and network meta-analysis. Annals of Internal Medicine 161 (5), pp. 347–355. Cited by: §1.
-  (2009) A survey of collaborative filtering techniques. Advances in Artificial Intelligence 2009. Cited by: §2.
-  (1996) The sofa (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. Springer-Verlag. Cited by: §5.5.
-  (2011) Type of fluid in severe sepsis and septic shock.. Minerva Anestesiologica 77 (12), pp. 1190–1196. Cited by: §1.
-  (2012) 10-year CVD risk prediction and minimization via inverse classification. In Proceedings of the 2nd ACM SIGHIT symposium on International health informatics - IHI ’12, pp. 603–610. External Links: Cited by: §2.
-  (2019) Deep learning based recommender system: a survey and new perspectives. ACM Computing Surveys (CSUR) 52 (1), pp. 5. Cited by: §2.