In typical classification settings, a model is trained and used to make predictions about some event of interest. Depending upon the predictive task, some action may then be taken. In a medical domain, a patient may be monitored more carefully if a prediction yields a high likelihood of some negative outcome. However, in the same setting, we may want to know what actions can be taken to minimize the patient’s chances of said adverse event occurring. The process of finding the optimal set of actions, or changes, that can be taken in order to minimize the probability of such events occurring is what we terminverse classification.
This example domain further highlights the nature and importance of the problem. Consider, specifically, the problem of mitigating the long-term risk of cardiovascular disease (CVD) of Patient 29 taken from our experiments below. Initially, we use a constructed model to estimate this patient’s risk, or probability, of developing CVD, which is found to be 55%. This estimate is based on pertinent factors such as medications, lab measurements (e.g., blood glucose), lifestyle (e.g., diet), and demographics (e.g., age).
Following an initial assessment of risk, we would like to work ‘backwards’ through our learned model to obtain recommendations that reduce Patient 29’s probabililty of CVD. Past methods, however, restrict the set of classifiers that are used to obtain such recommendations, often only affording the use of a single algorithm. Such restrictions are prohibitive in that a particular classifier may have useful properties. These might include high predictive accuracy, such as the random forest used to obtain Patient 29’s initial level of risk, or a high degree of explanatory power,which may help the patient better understand why certain recommendations were made. Therefore we proposegeneralized inverse classification (GIC), which permits the use of virtually any classification function, requiring only a simple non-prohibitive assumption (further discussed in Section 3). This is the first contribution of this work.
Our second contribution is to show that the problem can be solved using heuristics. Specifically, we propose three real-valued heuristic-based methods that solve this problem, which we compare to two sensitivity analysis-based baseline methods. We demonstrate the efficacy of these results on two freely available datasets, one of which includes Patient 29, whose risk we lower from 55% to less than 30% (Section 4). Thirdly, we refine an existing inverse classification framework to include non-linear cost-to-change functions, which we then incorporate into our experiments. Section 3 outlines the framework, the generalized inverse classification problem, the three heuristic-based methods, and two sensitivity analysis-based methods, while Section 5 concludes the paper.
2 Related Work
Inverse classification is akin to the sub-discipline of sensitivity analysis, which examines the impact of predictive algorithm input on the output. While there are many forms of sensitivity analysis[1, 2], local and variable perturbation methods are most similar. Based on this we develop two sensitivity analysis-based methods, related in Section 3, for comparison purposes.
Past works on inverse classification differ with respect to three distinct perspectives: operational data types, algorithmic mechanism, and framework. The operational data types which encode the data, on which inverse classification is performed, are either discrete [3, 4, 5], continuous [6, 7, 8, 9], or both . The latter two allow for more fine-grained results leading to greater precision in the recommendations made.
The algorithmic mechanism operates on these data types by finding the feasible recommendations that optimize the predicted probability. Such optimization strategies are constructed to be greedy [3, 4, 5, 6] or non-greedy [7, 8, 9, 10].
The framework ensures that recommended changes are feasible and implementable. These include: (1) identifying features that can be changed (e.g. age cannot), (2) the difficulty in implementing changes (feature-specific costs) and (3) a restriction on the cumulative change (budget). In [3, 8] there are no constraints imposed. Of those that do impose constrains:
In  constraints are imposed that lead to non-extreme recommendations, but neither (1), (2), or (3) are considered.
In  (2) is imposed, but not (1) or (3).
A different notion of (1), (2), and (3) are explored in  by matching discrete entities to compute features.
In  (1), (2), and (3) are all considered, but does not permit nondifferentiable classifiers.
Real-valued heuristic-based methods are also relevant to this work. These methods include variable neighborhood search (VNS), genetic algorithms, and hill-climbing [4, 6]. In this work we elect to focus on genetic algorithms, hill-climbing, and local search, which can be viewed as a simpler form of VNS. As will be shown, by using heuristic-based methods, we can be as general as possible in solving the inverse classification problem.
3 Generalized Inverse Classification
In this section we first briefly discuss GIC. Subsequently, we outline our inverse classification framework. Next, we relate three heuristic-based methods that can be used to solve GIC. Finally, we introduce two sensitivity analysis-based methods that will be compared to our heuristic-based methods.
Under the GIC formulation no assumptions are made about the classification function other than . Such a level of generality allow us to obtain optimal solutions for nondifferentiable functions. These functions include popular ensemble techniques such as bagging  and boosting 
, as well as C4.5 decision trees. Classifiers such as these are often found to have high predictive power (ensembles) or are more readily interpretable and explainable (e.g., C4.5 decision trees), which is why it is so important methods be developed that incorporate such classifiers.
Suppose is a dataset of instances where
is a column feature vector of lengthand is the binary label associated with for . Let be a function that computes the probability of being in the positive class (with ). Typically, is based on a certain classification model built on the dataset. Given a new instance, with feature vector , we want to modify some components of , subject to some budget constraints, so that the predicted probability of being positive is minimized.
We further partition the features into three subsets, , and , which represent the sets of unchangeable, directly changeable and indirectly changeable features, respectively. When we optimize the features, we can only determine the value for and the values of will depend on and . Therefore, we model the dependency of on and as where the mapping is assumed to be differentiable. Note that the mapping can be any predictive model constructed using the same training instances. Therefore, we represent as to distinguish these three blocks so that the feature optimization problem can be formulated as
Here, we assume the reasonable value of each directly changeable feature in must be within an interval, denoted by for . If can only be increased (decreased), we can set (). In addition, is a convex cost function that measures the cost for changing to and is the total budget we have to support this change. We require .
Here, we provide two examples of . The first assumes the cost increases linearly, as is deviated from , which is
where and , and and denotes the costs for increasing and decreasing the feature by one unit for . If one assumes the costs increase quadratically as deviates from , then
Note that the constants and in (2) and (3) can be different. In both cost functions, if decreasing (increasing) is cost-free, we can set (). In the rest of this paper, we will only focus on the quadratic cost in (3).
and for . The projection mapping onto the feasible set is defined as
We then define a subroutine for solving (6). We first define
for each and . The subroutine is given in Algorithm 1 whose validity can be easily verified by the KKT conditions of (4). Note that the bisection search in Algorithm 1 can always succeed because monotonically decreases to zero as increases to infinity.
3.2 Heuristic-based methods
We propose three real-valued heuristic-based algorithms to solve the generalized inverse classification problem: hill-climbing + local search (HC+LS), a genetic algorithm (GA), and a genetic algorithm + local search (GA+LS).
There are several processes shared among the three algorithms. For simplicity of notations, we assume the features of indexed by are the first features, i.e., . Letindicating the indexical position of feature vector that will be perturbed. Perturbations to feature
occur according to a standard normal distribution
where is random variable representing the perturbation that occurs at indexical position and
is the standard deviation of featureobtained from the training data. Let be a vector that equals one in the th coordinate and zero in other places so that the perturbed version of is denoted by . Let represent the th row of a matrix . Two additional shared parameters include which we will use to denote the total population size and which we will use to denote the number of iterations until an algorithm terminates.
3.2.1 Hill-climbing + local search
Our hill-climbing + local search (HC+LS) algorithm is based on that outlined by Mannino and Koushik  and is related by Algorithm 3 which calls a local search procedure, outlined in Algorithm 2. In this algorithm, the best current solution, denoted by , is perturbed a single feature at a time in order to find a better solution. There are single-feature perturbations that occur at each iteration, leading to perturbed versions of , denoted by , for . We use Algorithm 1 to convert the direction into a feasible state and update along the direction that yields the smallest , where is defined in (4).
We note here that the difference between regular HC and HC+LS is that HC operates on a first improvement basis, whereas HC+LS operates on a best improvement basis.
3.2.2 Genetic algorithm
Genetic algorithms are composed of four primary processes: initial population generation, crossover, carryover, and mutation. Our real-valued genetic algorithm (GA) is outlined by Algorithm 4. Prior to outlining such a method, we first relate the four aforementioned components.
At the first iteration of our genetic algorithm, an initial population is generated. For , let be a discrete uniform random variable. We then generate and as in (8) for and define
We use for as the initial population and store them as the rows of a matrix , i.e,
We note that is updated times, resulting in unique entries in . Here, we apply (6) to ensure that all population chromosomes are feasible.
Following this, a simple procedure is called. This orders the rows by objective function value from smallest to largest. Let be a user specified parameter that denotes the proportion of the population that will be bred to produce the offspring for the next generation. We make a copy of the first rows of and store them as a matrix with for . We then randomly shuffle the rows of using a procedure .
Let be the proportion of the population that should be composed of children ( being the proportion of the population that will be carried over, discussed shortly). We construct a vector of indices as
where is a uniformly distributed random index for .
Selected chromosomes are bred using single-point crossover outlined in Michalewicz, 2013 , adapted to maintain feasibility via our projection operator. Without loss of generality, we assume is an even number and for some integer . For , we use the vector defined in (11) to create children from the matrix of parent chromosomes by doing
where represent the entry in the th row and the th column of a matrix , is a selected crossover point, generated for a pair of parents, i.e., the rows and of for . Mut is the mutation operator defined as
where , is generated as (8), is a binary random variable which equals zero and one with a probability of and respectively ( is a user-specified parameter representing the probability of mutation occurring to allele by amount defined in (8)). Subsequently, the children are ensured feasible by
These feasible children are then stored as rows of a matrix .
The carryover procedure uses roulette wheel selection  to select chromosomes from the current generation that will survive to the next. Chromosomes that have larger (i.e., better) fitness values (where fitness denotes solution quality) have a higher likelihood of surviving to the next generation.
First, we create an inverted solution vector . These inverted solutions are transformed from by function defined as
where is the worst-case solution possible and is assumed to be positive. Using we construct a vector of selection probability
Intuitively, higher quality solutions have larger probability to be selected, since we have already ordered by so that .
Using (16) we select chromosomes from the population matrix to be carried over to the next generation by
for . Those selected children are stored as the carryover matrix .
Using (10),(12), (13) and (17) we construct our GA as outlined by Algorithm 4. The procedure begins by initializing the best solution to the unperturbed chromosome . The algorithm then begins iteration, executing times. If it is the first iteration, the initial population is generated. The current population is then evaluated and if a better solution is found, it is updated. Following this, a simple procedure is called. This orders by objective function value from smallest to largest. Crossover points are then selected in an elitist fashion from this ordered matrix of chromosomes. Selected chromosomes are then randomly shuffled, using procedure , before crossover is applied to create the offspring chromosomes. Next, the carryover chromosomes are selected. Finally, the children and carryover chromosome matrices are concatenated to form the population for the next generation.
3.2.3 Genetic algorithm + local search
The third method is a genetic algorithm + local search (GA+LS). It is related by Algorithm 5. There are a few important distinctions between the original GA and that with local search applied. First, we reformulate the crossover procedure outlined by (12) to be
for . The reader will note that here the mutation procedure is not applied.
Second, we incorporate the use of the local search (LS) procedure previously outlined in Algorithm 2. Here, we set parameter equal to , which dictates the extent of the search.
GA+LS is outlined by Algorithm 5. The differences between this method and the original GA are outlined in blue. At line 14 the LS procedure is applied to each of the non-mutated children. The best solution obtained from LS is the child chromosome that is kept for the next generation.
3.3 Sensitivity analysis-based methods
As discussed in Section 2, sensitivity analysis is closely related to inverse classification. Therefore, we propose two sensitivity analysis-based algorithms that serve as baselines against which the heuristic-based methods can be compared against. To our knowledge, no past methods addressing this problem have been proposed. Therefore, we craft these ourselves, and believe that they represent a reasonable initial attempt at a solution. Such methods can be viewed as a combination of local and variable perturbation methods of sensitivity analysis.
We refer to the first sensitivity analysis-based method as Local Variable Perturbation–Best Improvement (LVP-BI). This method calls for perturbing a single feature to the extent of feasibility given by . The single feature perturbation having the greatest objective function improvement is the one that is accepted. If some budget remains following this perturbation, subsequent perturbations are performed (e.g., double feature, triple feature, etc. perturbations).
Our second method, which we refer to as Local Variable Perturbation–First Improvement (LVP-FI), is very similar to that of LVP–BI. Instead of accepting the best perturbation over all it accepts the first perturbation that leads to a better objective function value, where is selected at random.
In this section we first outline our choices regarding the parameters of the inverse classification framework and then apply our methods to two freely available datasets. Our experiments will evaluate the five methods by examining the average likelihood of test instances conforming to a non-ideal class over varying budget constraints.. First, we will explore the capability of each algorithm in reducing the likelihood of test instances conforming to a non-ideal class. Additionally, we will examine the perturbations made to an individual test instance, selected at random, by the top performing algorithm. We wish to emphasize that practical and real-world use of these methods should be undertaken with experts in the domain of use. We further emphasize that inverse classification puts the individual at the center of the process and optimizes over his/her current values. Therefore, if an individual so choses, he/she can adjust expert-specified costs according to their own outlook on what may be more or less difficult to change.
4.1 Experiment Parameters and Evaluation
There are three choices that need to be made regarding the established inverse classification framework: the learning algorithm, the indirectly changeable feature estimator, and the method we will use to set the lower- and upper-bounds that directly changeable features can take.
4.1.1 Objective Function
We selected the Random forest classifier  to evaluate each of the five methods. We chose this as it is (a) an ensemble classifier and (b) composed of weak-learner decision trees. Both (a) and (b) are separately non-differentiable, and comprehensively help highlight the need for the GIC formulation we have proposed. The returned objective function value will be the proportion of decision trees in the ensemble voting in favor of the class to be minimized. As such . We therefore can also parameterize , the worst-case objective function value in (15).
4.1.2 Indirectly Changeable Feature Estimation
where is a training instance and is the Gaussian kernel. We elect to use this function and corresponding Gaussian kernel for its similarity-based estimation properties. We cross-validate this model on each of the indirectly changeable features in order to learn the best for each.
4.1.3 Bound-setting method and cost function
Lash et al.,  outline two methods of specifying lower- and upper-bounds for the directly changeable features. Each result in different algorithmic behavior. In our experiments we use the Hard-line bound-setting method. Under this method we specify, for feature , the upper- and lower- bounds such that can only either increase or decrease. If feature should increase from its current value of we set . If feature should decrease from it’s current value of we set . This allows us to maintain more control over what we know and believe to be the beneficial direction of feature movement. We do note, however, that under different circumstances (e.g., uncertainty) it may be beneficial to allow the optimization to learn the most beneficial direction of feature movement.
In this set of experiments we elect to explore the effects of non-linear costs, related by 3. We elect to do so as non-linear costs, to the best of our knowledge, have not been explored in past works.
4.1.4 Evaluating Recommendations
To evaluate the success of the inverse classification we use an established procedure originally outlined in  and refined in . This process entails initially splitting a dataset randomly into two equal parts , where the first is used for training the random forest model upon which inverse classification will take place. The second set , is the held-out set of data to which inverse classification will be applied.
is further partitioned into distinct subsets which we can denote , ( in our experiments). The process of evaluation entails that we perform inverse classification on and use to train a separate model to evaluate the success of the inverse classification. Such a process ensures that no information used to perform the inverse classification and obtain recommendations is used in evaluating how successful the process actually was. Additionally, this helps ensure that the classifier used to make the recommendations has not overfit the data.
4.2 Student Performance: Grade-improving recommendations
Our first set of experiments are conducted on a UCI Machine Learning Repository dataset called Student Performance. This dataset consists of Portuguese students enrolled in two different classes: a math class and a Portuguese language class. Represented as two disjoint, but overlapping datasets, we elect to use the Portuguese language set as it has the larger number of instances ().
4.2.1 Data Description
Each individual in Student Performance is initially represented by 45 features, including a unique identifier (discarded) and class variable , which we define to be whether or not a student’s final grade was above a C () or, conversely, less than or equal to a C (). Our GIC methods will attempt to reduce the likelihood of earning a grade of C or worse. We discard the two intermediary grade reports to reflect a long-term goal of earning a higher grade overall and make the problem more realistic. The full set of features and corresponding parameters can be viewed in the Supplemental Material.
The parameters set for the three heuristic-based methods in these experiments are related by Table 1, as is the computational complexity. We arrived at these after a brief exploration of the parameter space, selecting values that were comparable so that performance could be equivalently compared. For GA+LS, we kept (abbreviated ) lower because of the added complexity of the parameter.
|Param||HC + LS||GA||GA + LS|
We first examine the success of reducing the average predicted probability for each of the five methods. These results are reported in Figure 1. We report each over 15 increasing budgetary constraints. Additionally, we include the best result on a randomly selected positively classified instance – Student 57 – obtained using GA.
As we can observe in Figure 1 the two sensitivity analysis-based methods were unsuccessful. The result also shows that the three heuristic-based methods are comparable, with GA and GA+LS declining slightly faster than HC+LS. We include more detailed information about the performance of each method in the Supplemental Materials.
We report the changes made to “Student 57” in Figure 2 for the method most successful in reducing their predicted probability: GA. We report this so that the reader may have a better idea of what such recommendations look like. GA recommends the student to increase study time and curb weekday alcohol consumption, as well as to decrease time out with friends.
Cumulatively, the three heuristic methods were, on average, able to reduce the probability from approximately 70% to 62% at a budget level of three. Individually, the best performing method was able to reduce Studet 57’s probability from 70% to 50% at a budget level of five.
4.3 Cardiovascular disease mitigating lifestyle recommendations
Our second set of experiments is conducted on a real-world patient dataset, derived from the ARIC study. These data are freely available upon request from BioLINCC.
4.3.1 Data Description
These data represent patients, for whom we have known cardiovascular disease (CVD) outcomes over a 10 year period. There are 110 defined features for each patient. Patients who, during the course of the 10 year period have probable myocardial infarction (MI), definite MI, suspect MI, definite fatal coronary heart disease (CHD), possible fatal CHD, or stroke have and otherwise. Patients who had a pre-existing CVD event are excluded from our dataset, giving us a total of patients. This set of experiments is meant to more closely reflect a real-world scenario and, as such, is guided by a CVD specialist. The full list of features, their feature designation (e.g., changeable) and parameters (e.g., cost) can be viewed in the Supplemental Materials.
After a brief exploration of the parameter space, we arrived at the same set of parameters as in the previous experiment (Student Performance). We omit the duplicate table and refer to Table 1. Additionally, because of the size of the testing dataset, and the computational complexity associated with the heuristic-based methods, we elected to test on a subset of data. We used all 587 positive test instances and another 587 randomly selected negative test instances, giving us a final evaluative test set size of 1164. Evaluation models were constructed using the full set of data by the procedure outlined in Section 4.1.4.
We first examine the success of reducing the average predicted probability using the five outlined methods. These results are reported in Figure 3. We report each over 15 increasing budgetary constraints. Additionally, we include the best result on a randomly selected positively classified instance – Patient 29 – obtained using GA+LS.
The results obtained for the heuristic-based methods are similar to those of Student Performance. There is a striking difference, however, between those and the sensitivity based-method results here. We observe that LVP-FI outperforms all other methods, while LVP-BI is comparable to GA and GA+LS. HC+LS performs the worst. The stark difference in performance of LVP-FI and LVP-BI on this dataset vs. that of student performance may suggest that there are instances in which it is advantageous to use sensitivity analysis-based methods over those that are heuristic-based, and vice-versa. We leave such an analysis for future work.
We report the changes made to “Patient 29” in Figure 4 for the method most successful in reducing the patient’s predicted probability: GA+LS. Here we observe that the number of feature changes recommended are quite numerous: there are 22 of them. This suggests that it may be beneficial to include sparsity constraints.
Cumulatively, these results show that, on average, risk can be taken from approximately 50% to 30-35%, depending upon the method, at a budgetary level of two. At the individual level, using the best method, Patient 29’s risk can be lowered from 55% to less than 30%, also a at a budgetary level of two.
In this work we propose and solve generalized inverse classification by working backward through the previously un-navigable random forest classifier using five proposed algorithms that we incorporated into a framework, updated to account for non-linear costs, that leads to realistic recommendations. Future work is needed to analyze instances in which one method may outperform another, the performance of other classifiers and constraints limiting the number of features that are changed.
-  S. S. Isukapalli, Uncertainty Analysis of Transport-transformation Models. PhD thesis, Citeseer, 1999.
-  J. Yao, “Sensitivity analysis for data mining,” in Fuzzy Information Processing Society, 2003. NAFIPS 2003. 22nd International Conference of the North American, pp. 272–277, July 2003.
-  C. C. Aggarwal, C. Chen, and J. Han, “The inverse classification problem,” Journal of Computer Science and Technology, vol. 25, no. May, pp. 458–468, 2010.
-  C. L. Chi, W. N. Street, J. G. Robinson, and M. A. Crawford, “Individualized patient-centered lifestyle recommendations: An expert system for communicating patient specific cardiovascular risk information and prioritizing lifestyle options,” Journal of Biomedical Informatics, vol. 45, no. 6, pp. 1164–1174, 2012.
-  C. Yang, W. N. Street, and J. G. Robinson, “10-year CVD risk prediction and minimization via inverse classification,” in Proceedings of the 2nd ACM SIGHIT symposium on International health informatics - IHI ’12, pp. 603–610, 2012.
-  M. V. Mannino and M. V. Koushik, “The cost minimizing inverse classification problem : A algorithm approach,” Decision Support Systems, vol. 29, no. 3, pp. 283–300, 2000.
D. Barbella, S. Benzaid, J. Christensen, B. Jackson, X. V. Qin, and D. Musicant, “Understanding support vector machine classifications via a recommender system-like approach,” inProceedings of the International Conference on Data Mining, pp. 305–11, 2009.
-  P. C. Pendharkar, “A potential use of data envelopment analysis for the inverse classification problem,” Omega, vol. 30, no. 3, pp. 243–248, 2002.
-  M. T. Lash, Q. Lin, W. N. Street, and J. G. Robinson, “A budget-constrained inverse classification framework for smooth classifiers,” arXiv preprint; arxiv:1605.09068, 2016.
-  M. T. Lash and K. Zhao, “Early predictions of movie success: The who, what, and when of profitability,” Journal of Management Information Systems, vol. 33, no. 3, pp. 874–903, 2016.
-  Z. Michalewicz, Genetic algorithms+ data structures= evolution programs. Springer Science & Business Media, 2013.
-  L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, pp. 123–140, 1996.
-  Y. Freund and R. E. Schapire, “Experiments with a new boosting algorithm,” Thirteenth International Conference on Machine Learning, pp. 148–156, 1996.
-  L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
-  E. a. Nadaraya, “On estimating regression,” Theory of Probability & Its Applications, vol. 9, no. 1, pp. 141–142, 1964.
G. S. Watson, “Smooth regression analysis,”The Indian Journal of Statistics, Series A, vol. 26, no. 4, pp. 359–372, 1964.
-  P. Cortez and A. M. G. Silva, “Using data mining to predict secondary school student performance,” in Proceedings of 5th Annual Future Business Technology Conference, EUROSIS, 2008.
Supplemental Material – Generalized Inverse Classification
These tables show the unchangeable, indirectly changeable, and directly changeable features for each of our two freely available datasets. For each of the indirectly changeable features, the kernel regression parameter is also included.
|School Attended, Sex, Age, Address, Size of family, Parent’s cohabitation status, Mother’s education, Father’s education, Mother’s job= ”At Home”, Mother’s job=”Health”, Mother’s job=”Other”, Mother’s job=”Services”, Mother’s job=”Teacher”, Father’s job=”Teacher”, Father’s job=”Other”, Father’s job=”Services”, Father’s job=”Health”, Father’s job=”At Home”, Reason for school=”Course”, Reason for school=”Other”, Reason for school=”home”, Reason for school=”Reputation”, Guardian=”Mother”, Guardian=”Father”, Guardian=”Other”, Time spent traveling to school|
|Extra-curricular activities: 1.5, Higher education aspirations: 1.0, In a romantic relationship: 1.5, Free time after school: 1.0|
|Study time: 7, Paid tutoring: 8|
|Time out with friends: 6, Weekday alcohol: 3, Weekend alcohol: 6, Absences from class: 5|
|Insulin (uu-ml), Height (cm), Age, Peripheral Artery Disease, Peripheral Artery Disease (definition 2), Plaque/shadowing in either internal, Plaque in either internal carotid, Cholesterol lowering med (last 2 weeks), Hypertension (definition 5), Education level, Diabetes, Age when menopause began, Menopause status, Ever smoked cigarettes, High blood pressure med (past 2 weeks), Agina-chest pain med (past 2 weeks), Heart rhythm control med (past 2 weeks), Heart failure med (past 2 weeks), Blood thinning med (past 2 weeks), Blood sugar med (past 2 weeks), Stroke med (past 2 weeks), Walking leg pain med (past 2 weeks), Headache or cold med (past 2 weeks), Pain meds (past 2 weeks), Gender, Race, Years smoked cigarettes|
|BMI (Body Mass Index): .5, Recalibrated HDL cholesterol (mg/dl): .5, Re-calibrated LDL cholesterol (mg/dl): .5,Total cholesterol (mmol/L): .5, Total triglycerides (mmol/L): .5, 2nd and 3rd systolic blood pressure (avg.): .5, 2nd and 3rd systolic blood pressure (avg.) Num 2: .5, Waist girth (cm): .5, Hip girth (cm): .5, Heart rate: .5, White blood count: .5, Apolipoprotein AI(mg-dl): .5, Apolipoprotein B (mg-dl): .5, Apolp(A) Data (ug-ml): .5, Ankle-brachial index (Def 4): .5, FV(1)/FVC Predicted (%): .25, FEV(1) (L): .5, FVC (L): .5, Hematocrit: .5, Hemaglobin: .5, Platelet count: .5, Neutrophils: .5, Neutrophil bands: .5, Lymphocytes: .5, Monocytes: .5, Eosinophils: .5, Basophils: .5, APTT Value: .5, VIII: C Value: .5, Fibrinogen Value: .5, VII Value: .5, ATIII Value: .5, Protein: C Value: .5, VWF Value: .5|
|Cornell voltage (uV): .5, Waist-hip ratio: .5, Vegetable fat (% kcal): .5, Carbs (% kcal): .5, Alcohol (% kcal): .5, Omega fatty acid (g): .5, Calf girth (cm): .5, Subcaps measure 2 (mm): .5, Triceps measure 2 (mm): .5, Uric acid (mg-dl): .5, Total protein (gm-dl): .5, Albium (gm-dl): .5, Phosphorus (mg-dl): .5, Magnesium (meq-l): .5, Calcium (mg-dl): .5, Urea nitgrogen (mg-dl): .5, Potassium (mmol-l): .5, Sodium (mmol-l): .5, Creatinine (mg-dl): .5, Weight (lb): .5,Total fat (% kcal): .5, Saturate fatty acid (% kcal): .5, Protein (% kcal): .5, Polyunsaturated fatty acid (% kcal): .5, Monounsaturated fatty acid (% kcal): .5, Total fat (g): .25|
|Dark or grain breads: 3, Peanut butter: 4, Nuts: 5, Other(prunes,avocado): 5, Vegetables: 6, Fruit: 6, Fiber: 7, Vegetable fat: 5, Polyunsaturated fat: 5|
|Liver: 8, White carbs: 6, Fish: 9, Cereal: 4, Cigarettes: 9, Caffeine: 7, Carbs: 7, Cholesterol: 6, Sodium: 7, Animal fat: 7, Saturated fat: 6|
|Exercise hours: 10, Alcohol: 9|
These figures show additional algorithm-specific results that supplement and support certain conclusions that are made in the main content of the paper. Here, red shows the average probability, yellow shows the probability for a randomly selected instance and blue shows the bottom 5 and top 95 % of probabilities.