UQ-CHI: An Uncertainty Quantification-Based Contemporaneous Health Index for Degenerative Disease Monitoring

02/21/2019 ∙ by Aven Samareh, et al. ∙ University of Washington 0

Developing knowledge-driven contemporaneous health index (CHI) that can precisely reflect the underlying patient across the course of the condition's progression holds a unique value, like facilitating a range of clinical decision-making opportunities. This is particularly important for monitoring degenerative condition such as Alzheimer's disease (AD), where the condition of the patient will decay over time. Detecting early symptoms and progression sign, and continuous severity evaluation, are all essential for disease management. While a few methods have been developed in the literature, uncertainty quantification of those health index models has been largely neglected. To ensure the continuity of the care, we should be more explicit about the level of confidence in model outputs. Ideally, decision-makers should be provided with recommendations that are robust in the face of substantial uncertainty about future outcomes. In this paper, we aim at filling this gap by developing an uncertainty quantification based contemporaneous longitudinal index, named UQ-CHI, with a particular focus on continuous patient monitoring of degenerative conditions. Our method is to combine convex optimization and Bayesian learning using the maximum entropy learning (MEL) framework, integrating uncertainty on labels as well. Our methodology also provides closed-form solutions in some important decision making tasks, e.g., such as predicting the label of a new sample. Numerical studies demonstrate the effectiveness of the propose UQ-CHI method in prediction accuracy, monitoring efficacy, and unique advantages if uncertainty quantification is enabled practice.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The effective monitoring of degenerative patient conditions represents a significant challenge in many clinical decision-making problems and has given rise to the development of numerous mathematical and computational models brownell1999dopamine ; gratwicke2017early ; llano2017multivariate ; chen2014credit . Developing a knowledge-driven contemporaneous health index (CHI) that can precisely reflect the underlying patient condition across the course of the condition’s progression holds a unique value, like facilitating a range of clinical decision-making opportunities spring2013healthy ; rivera2012optimized ; deshpande2014control , enhancing the continuity of care, and facilitating communications between clinicians, healthcare providers, and patients. It will also be a crucial enabling factor for the development of many envisioned AI systems to implement adaptive interventions for better healthcare management, given a representation of the dynamic evolution of the patient’s condition.

Thus, to ensure continuity of care, we should be more explicit about our level of confidence in model outputs. Ideally, decision-makers should be provided with recommendations that are robust in the face of substantial uncertainty about future outcomes. However, computational models are an abstraction of clinical observations, as such, they are usually built on analytically tractable assumptions that may simplify the real-world problem. Also, most of these models are estimated from imperfect data, subjecting them to all kinds of statistical errors. An approach that yields only a single prediction doesn’t adequately reflect any uncertainty, neither in the empirical data nor the estimated parameters

allmaras2013estimating . As a result, the outcomes from such mathematical models may not be consistent with the clinical observations. Uncertainty is an unavoidable feature that affects prediction capabilities in real-world domains such as healthcare hoffman1994propagation ; meghdadi2017brain , manufacturing montomoli2015uncertainty ; nannapaneni2014uncertainty , signal processing reynders2016uncertainty ; nobari2015uncertainty , and etc. A certain amount of uncertainty is always involved in decision-making systems that do not encounter samples when the experimental data are insufficient to calibrate. In such cases, there is always a chance that the model parameters be determined unambiguously even in the existence of complex mathematical models. In clinical predictions, it is necessary to deal with such uncertainty in an effective manner, because if the model parameters are not well constrained, the resulting predictions may represent an unacceptable degree of posterior uncertainty. What is more, while most existing models in patient monitoring generate one single prediction without telling confidence level, uncertainty quantification could tell us on which samples we may not be ready to act based on the model. Therefore, to develop a reliable model for a clinically relevant prediction, uncertainty quantification is a much-needed capacity collis2017bayesian ; biglino2017computational ; bozzi2017uncertainty .

A number of patient monitoring index approaches have been developed in the literature. A standard formulation of these health indices is to use weighted sum models (e.g., regression models), and combine multiple static clinical measurements to predict the disease condition. For example, there exist many risk score models to predict AD by using multi-modality data integration methods liu2013data ; yuan2012multi ; zhang2011multimodal to combine neuroimaging data weiner2013alzheimer ; weiner20152014 , genomics data biffi2010genetic , clinical data reitz2010summary , etc. There are a few approaches that have formulated the decline of AD-related score over time as a multi-task learning model zhou2013modeling ; zhou2012modeling . These existing efforts have been limited to combining static data rather than longitudinal data. Besides, these data are usually sampled at irregular time points, which adds in another layer of complexity to the modeling efforts. Our problem’s objective is fundamentally different from the existing risk score models; we focus on developing the contemporaneous health index (CHI) that can fuse irregular multivariate longitudinal time series dataset to quantify the severity of degenerative disease conditions that are required to fit the monotonic degradation process of the disease condition. For example, in our previous work samareh2018dl to address the patient heterogeneity, we developed a dictionary learning-based contemporaneous health index for degenerative disease monitoring, called DL-CHI, that leveraged the knowledge of the monotonic disease progression process to fuse the data by integrating CHI with dictionary learning. The basic idea of DL-CHI was learning individual models via the CHI formulation, and then rebuilding the model parameters of each patient’s models through a supervised dictionary learning. However, both CHI and DL-CHI frameworks only generate one single prediction value for a sample and ignore the sampling uncertainty (i.e., it is common in healthcare that the label information is usually obtained by subjective methods which are subject to uncertainty). Therefore, if we could enable CHI to conduct uncertainty quantification and incorporate the uncertainty in labels in its modeling, we can widen its applicability in real-world contexts. The main objective of this paper is to develop a framework that can focus on the contemporaneous health index (CHI) developed in huang2017chi , and can further equip CHI with uncertainty quantification capacity.

In this paper, we develop the uncertainty quantification based contemporaneous longitudinal index, named UQ-CHI, with a particular focus on continuous patient monitoring of degenerative conditions. Our method is to combine convex optimization and Bayesian learning using the maximum entropy learning (MEL) framework, integrating uncertainty on labels as well. The basic idea of MEL is to identify the distribution of the parameters of a statistical model that bears the maximum uncertainty, a principle that is conservative and robust mackay2003information ; izenman2008modern ; phillips2006maximum

. It has been investigated in a few machine learning models

jaakkola2000maximum ; sun2013multi ; chao2019semi ; zhu2018semi as well. For example, in jaakkola2000maximum

, MEL was used to learn a distribution of the parameters in the support vector machine model rather than a single vector of the parameters. This distribution of the parameters could help us evaluate the uncertainty of the learned support vector machine model and translate into the uncertainty of predictions.

To adapt the MEL formulation and to develop UQ-CHI, few challenges should be addressed. The objective function of MEL, as its distinct feature, bears the full spirit of maximum entropy: no matter what is the model, we are studying, the learning objective of MEL is to learn the distribution model of the parameters of the model that has the maximum entropy. If there is a prior distribution of the parameters, the Kullback–Leibler divergence could be used to extend this idea. In our case, the design of the prior distribution should be studied to account for label uncertainties. Besides the objective function, the MEL encodes information from the data into constraints, e.g., if the model is for classification, for each sample, there would be a constraint that the expectation of the prediction over the distribution of the parameters should match the observed outcome on this sample. In our case, we will derive the constraints from the CHI model and integrate with the MEL framework. In detail, we consider two steps in our method, i.e., training and prediction. In the training step, we consider a prior uncertainty over the labels to handle uncertain or incomplete labels. Then we derive a solution to the optimization problem by using a specific prior formulation. In the second step, we develop a prediction method, with a rejection option method, for new samples with the obtained uncertainty quantification capacity. A distinct feature of our model is that it provides a closed-form solution for predicting the label of a new example. The whole pipeline of this UQ-CHI model is shown in Figure

1.

Figure 1: A conceptual overview of the UQ-CHI method

The remainder of this paper is organized as follows: in Section 2, we will review related literature in modeling the contemporaneous health index for degenerative conditions and the MEL framework. In Section 3, the UQ-CHI framework will be presented. In Section 4, we will implement and evaluate the UQ-CHI using a simulated dataset. We then continue the numerical analysis with a real-world application on Alzheimer’s disease dataset in Section 5. We will conclude the study in Section 6. Note that, in this paper, we use lowercase letters, e.g., x, to represent scalars, boldface lowercase letters, e.g., v, to represent vectors, and boldface uppercase letters, e.g., W, to represent matrices.

2 Related works

In this section, we will first briefly present the basic formulation of the contemporaneous health index (CHI) model, and its extension, the dictionary learning based contemporaneous health index (DL-CHI), then we will present the proposed model: the UQ-CHI.

2.1 The CHI model

The CHI model is developed in huang2017chi which exploits the monotonic pattern of disease over the course of progression to improve further the data fusion of multivariate clinical measurements taken at irregular time points. The CHI framework was inspired by the common characteristics of degenerative conditions (e.g., AD) that often cause irreversible degradation. For example, in AD, to measure the degradation of the neural systems a number of biomarkers were developed, including neuroimaging modalities such as PET and MRI scans mueller2005alzheimer ; petrella2003neuroimaging . For example, MRI scans show a decline in the brain volume over time along with the disease progression. The same phenomenon could be observed on the PET scans when there is a persistent shrinkage of metabolic activities. Such monotonic patterns indicate that once the disease progression started, it tends to deteriorate over time increasingly. The task of CHI is to translate multivariate longitudinal and irregular clinical measurements into a contemporaneous health index to capture the patient’s condition changing over the course of progression. Note, clinical measurements for each patient could be taken with different length of time and at different time locations. Targeting degenerative conditions, CHI is designed to be monotonic, i.e., if , while higher index represents a more severe condition. CHI is a latent structure; hence, clinical variables associated with it should be measured over time to facilitate data for learning the index.

Let, , denote a training set of patients. Each measurement , is the value of the th variable for the th subject in a given time , where is the time index. our goal is, given a training set, convert each measurement into an health index , which requires a mathematical model of . For simplicity, multivariable form of the hypothesis function was studies in huang2017chi , i.e., , where is a vector of weight coefficients that combines the variables. The total number of positive and negative samples is shown by and respectively, i.e., and . The formulation of the CHI learning framework is shown in below:

(1a)
(1b)
(1c)
(1d)
(1e)
(1f)

Items in (1) can be explained as follows:

  • The first term (1a) and the second term (1b) are derived from a general formulation of support vector machine (SVM). These two terms are used to enhance the discriminatory power of CHI by utilizing the label information. Here, is the label of the th sample that indicates if the th subject has the disease or not.

  • To accommodate the monotonic pattern of disease progression, and to enforce the monotonicity of the learned health index, the term (1c) is invented, i.e., if . Here, is the difference of two successive data vectors .

  • To encourage the homogeneity of CHI within the group that has the same health status terms (1d) and (1e) are invented. Here, and represent the center of data vectors at time for all positive and negative samples, respectively, that are,

  • To encourage sparsity of the features, -norm penalty is used as shown in the last term (1f).

The CHI formulation can be solved by using the block coordinate descent algorithm that is illustrated in huang2017chi . Note, the CHI formulation generalizes many existing models, such as SVM, sparse SVM, LASSO, etc.

2.2 The DL-CHI model

CHI formulation is designed for learning a model for the average of a population, and thus, ignores the patient heterogeneity. Patients who suffer from AD have very heterogeneous progression patterns cummings2000cognitive ; folstein1989heterogeneity ; friedland1988alzheimer . Building a personalized model on an individual basis could be used to consider the heterogeneity. However, such models require a significant amount of labeled training samples, which is not feasible in such clinical settings. Towards this goal, the DL-CHI approach was further developed in samareh2018dl by integrating CHI with dictionary learning olshausen1996emergence ; cummings2000cognitive . Dictionary learning algorithms reconstruct the input signals as an approximated signal via a sparse linear combination of a few dictionary elements or basis wright2009robust

(each column of the dictionary represents a basis vector). Dictionary learning algorithms can reveal the hidden structures in the data (in a similar spirit as principal component analysis) by spanning the space of a personalized model and capturing patient heterogeneity. They play a role in the regularization of the model learning, in a way that each dictionary basis vector can be viewed as the numerical representations of patient heterogeneity. Thus, DL algorithms can improve the classification performance. Translating this wisdom into DL-CHI, the basic idea is first to learn individual models through the CHI formulation, and then, reconstruct the model parameters of the individual learned models via supervised dictionary learning. As such, each model is represented as a sparse linear combination of the basis vectors. Numerous experiments in both simulated and real-world data have shown the effectiveness of DL-CHI in creating personalized CHI models.

Despite accounting the patient heterogeneity, DL-CHI ignores the sampling uncertainty, therefore limits its applicability in real-world applications. Thus, this motivates us to enable CHI to conduct uncertainty quantification.

2.3 The MEL formulation

As mentioned in Section 1, MEL formulation has a distinct objective function that aims to learn the distribution of the parameters of a model that encodes maximum uncertainty (i.e., evaluated by the entropy concept). It also has constraints that encode information from the data, e.g., if the model is for classification, for each sample, there would be a constraint that the expectation of the prediction over the distribution of the parameters should match the observed outcome on this sample. To further illustrate some details, one typical application of the MEL is the maximum entropy discrimination (MED) method that focuses on the application of MEL on classification models.

Let’s consider a binary classification problem, where the response variable

takes values from . Let be an input feature vector and be a discriminant function parameterized by , and e.g., . The training set is defined by and the hinge loss is defined as . The classification margin is defined as , and it is large and positive when the label agrees with the prediction. Traditional learning machines such as the max-margin methods learn the optimal parameter setting by the empirical loss and the regularization penalty as shown below:

(2)

Where

is the loss function which is a non-increasing and convex function of the margin, and

is the regularization penalty. However, MED considers a more general problem of finding a distribution over and classification margin parameters . This could be done by minimizing its relative entropy with respect to some prior target distribution under certain margin constraints. Specifically, suppose that a prior distribution, denoted as , is available, then MED learns a distribution

by solving a regularized risk minimization problem. When the prior distribution is not a uniform distribution, this can be generalized as minimizing the relative entropy (or Kullback-Leibler divergence) and the regularization penalty as follows (penalizing larger distances from priors):

(3)

Here, is a constant and is the hinge-loss that captures the large-margin principle underlying the MED prediction rule:

(4)

And the KL divergence is defined as follows:

(5)

Here in (3), the classification margin quantities are included; as slack variables in the optimization, which represents the minimum margin that must satisfy. MED considers an expectation form of the traditional approaches and casts Eq. (2

) as an integration. The classification constraints will also be applied in an expected form. As a result, MED no longer finds a fixed set of the parameters, but a distribution over them, and it uses a convex combination of discriminant functions rather than one single discriminant function to make model averaging for decisions. In particular, MED formulation finds distributions that are as close as possible with the prior distribution over all parameters regarding KL-divergence subject to various moment constraints. This analogy extends to cases where the distributions are also over unlabeled samples, missing values, or other probabilistic entities that are introduced when designing the discriminant function. Correspondingly, MED is an effective approach to learn a discriminative classifier as well as consider uncertainties over model parameters, which combines generative and discriminative learning

sun2018multi ; zhu2018semi . This generalization facilitates a number of extensions of the basic approach, including uncertainty quantification described in this paper. The present work contributes by introducing a novel generalization of CHI formulation by integrating the MED to perform the task of uncertainty quantification.

3 The proposed work: the UQ-CHI model

The overall goal of UQ-CHI is to learn a distribution over the parameters of CHI model . An additional goal is that this could be done even if only partial labels are given, and the labels might also be with uncertainty. Therefore, the first step in constructing the UQ-CHI is to create the constraint structure. To design the UQ-CHI, we incorporate some features from the original formulation of the CHI via Eq. (1) as follows: First, we utilize the label information by defining the discriminant function which corresponds to (1b). We, then incorporate the distinct feature of the CHI formulation, the monotonicity regularization function that corresponds to Eq. (1c). Note that, here, we will not incorporate the additional terms in Eq. (1d) and Eq. (1e) as they demand full knowledge of labels of the samples. In addition, we don’t include the sparsity regularization term (1f), since our focus is to learn rather than the parameter vector . Also, our model can induce sparsity, e.g., if we impose a Laplace prior distribution for the parameters as to what is done in Bayesian Lasso model park2008bayesian .

In the following subsections, we will introduce how we design the prior distributions, the constraints, and how to derive computational algorithms and closed-form solutions for training and prediction.

3.1 Design of constraints and prior distributions

As aforementioned, there are two types of constraints that we can extract from the CHI formulation into the development of UQ-CHI. One corresponds to the discriminant function used in CHI, to generate prediction on samples, while the other one corresponds to the monotonicity regularization function . Based on the CHI formulation, it is supposed that the model should lead to and . As this perfect model may not exist, a set of margin variables are introduced. We consider an expectation form of the previous approach and cast Eq. (1) as an integration. Hence, the classification constraints are applied in an expected sense. This will lead to the following formulation for the constraints:

(6a)
(6b)

Here, the term (6a) is the discriminant function and the term (6b) is the monotonicity regularization function. And, is the distribution of , and is the distribution of . With the prior distribution, we can derive the prediction rule: .

Now we move on to the design of the prior distribution . It is natural to decompose the joint prior distribution as a product of three distributions:

(7)

In what follows we discuss each of the three prior distributions. Specifically, it is reasonable to assume that a level of uncertainty can be designed to each example in defining . A simple solution is to set whenever is observed and otherwise. To define , we choose

to be a Gaussian distribution with mean vector as

and covariance matrix as an identity matrix

. To define the prior over the margin variables, we assume that it could be factorized . Further, following the idea proposed in jaakkola2000maximum , we can set and . Here, is actually the mean of the prior distribution of , so the idea of this distribution is to incur a penalty only for margins smaller than , while for margins larger than this quantity are not penalized. More details about the design of prior distributions will be given in Section 3.4.

3.2 The computational algorithm for UQ-CHI

The full formulation of the proposed UQ-CHI model is shown below:

(8a)
(8b)
(8c)

Essentially, solving optimization formulation Eq. (8) is to find a solution by calculating the relative entropy projection from the overall prior distribution to the admissible set of distributions that are consistent with the constraints. In what follows, we develop the computational algorithm to solve this formulation Eq. (8) and further derive the method for the prediction on samples.

3.2.1 Step 1: Training the model

In the training step, we consider a joint distribution of

, and the margin vector of while fixing . In this step, we first explain the solution to the MED optimization problem subject to the terms in (3).

Lemma 3.1.

Let the loss function be a non-increasing and convex function of the margin, and let the Lagrangian of the optimization problem defined as and be a set of non-negative Lagrange multipliers. Given the prior distribution and the model distribution , and the discriminant function in order to minimize the relative entropy in terms of the KL-divergence () subjected to set of defined constraints, the MED optimization problem (3) can be written as:

(9)

Here, is the normalization constant defined as:

(10)

The proof of Lemma 3.1 can be found in A. Now, the model training problem is revealed to be another optimization problem, that is learning optimal by solving the dual objective function under positivity constraint. Based on the results from Lemma 3.1, after adding dual variables for the constraint in Eq. (8), the Lagrangian of the optimization problem can be written as:

(11)

In order to find a solution, we require:

(12)

Which results in the following theorem.

Theorem 3.2.

The solution to the UQ-CHI problem has the following general form:

(13)

Thus, finding the solution to (8) depends on being able to evaluate the normalization constant .

Lemma 3.3.

Let be the normalization constant defined in Eq. (10). Based on the finding in (13), can be reformulated as follows:

(14a)
(14b)
(14c)
(14d)

Where, is defined in (14b) and (14c) is defined in (14d).

The proof of Lemma 3.3 can be found in the B. Given the reformulated normalization constant in (14), the maximum of the jointly concave function objective function showing in Eq. (9) can be found through a constrained non-linear optimization. As a result, by substituting Eq. (14) in Eq. (9) we get:

(15)

Here, . Thus, we have the following dual optimization problem:

(16)

The Lagrange multiplier , is recovered by solving the convex optimization problem Eq. (16). Note that since the prior factorizes across , UQ-CHI solution also factorized as well, i.e., .

Corollary 3.4.

From results in Theorem 3.2 the marginal distribution can be found as follows:

(17)

Where, can be obtained from Eq. (14b) and (14c).

3.2.2 Step 2: Prediction

After obtaining the marginal distribution in (17), the following lemma is used to predict the label of a new example . Referring to the solution of the UQ-CHI problem in (13), we can easily modify the regularization approach for predicting a new label from a new input sample that is shown by . In what follows, we generate the predictive label for the upcoming new labels.

Lemma 3.5.

Given the marginal distribution in (17) and the convex combination of discriminant functions , and let be the optimal Lagrangian multiplier obtained from the optimization problem (16), and given obtained from (14d), then the predictive label for the new () can be generated as:

(18)

The proof of Lemma 3.5 is shown in C.

3.2.3 Summary of the algorithms

A full description of the training and prediction of UQ-CHI model is given in Algorithm 1.

1:, , , and
2:Generate predictive labels for the upcoming new labels
3:while not converge do
4:     Start iterations t:= 1,2,…do
5:     Step 1 - Training model: find and
6:     for ,
7:     
8:
9:
10:
11:     
12:
13:     Step 2 - Prediction: predict the label of a new example ()
14:     
15:      end for
Algorithm 1 The UQ-CHI algorithm 

3.3 UQ-CHI with rejection option

Typically the performance of a prediction model is evaluated based on its accuracy, on a scheme of classifying all samples, regardless of the degree of confidence associated with the classification of the samples. However, accuracy is not the only measurement that can be used to judge the model’s performance. In many healthcare application, it is safer to make predictions when the confidence assigned to the classification is relatively high, rather than classify all samples even if confidence is low. In this case, a sample can be rejected if it doesn’t fit into any of the classes. In pattern recognition, this problem is typically solved by estimating the class conditional probabilities and rejecting the samples that have the lowest class posterior probabilities, that are the most unreliable samples. As UQ-CHI enables uncertainty quantification, here, we create a rejection option in prediction to show the utility of uncertainty quantification in practice. The basic idea of rejection option is that the prediction model rejects to generate a prediction if the uncertainty is higher than a given threshold. In other words, a sample that is most likely to be misclassified is rejected as described below:

(19)

Here, T is the rejection rate. The samples are rejected for which the maximum posterior probability is below a threshold. And a sample is accepted when:

(20)

Thus, we define a classification with rejection as , where, if a sample is rejected , denotes rejection, else, , where, corresponds to the classification of the th sample defined in Eq. (18).

Algorithm name UQ-CHI CHI
Label ratio Training ratio Rejection rate
Low = 20 Medium = 40 High = 60
Low = 10 30 0.69 0.74 0.81 0.61
50 0.73 0.76 0.83 0.62
70 0.75 0.77 0.85 0.65
Medium = 20 30 0.66 0.72 0.73 0.55
50 0.69 0.73 0.74 0.60
70 0.71 0.75 0.78 0.64
High = 50 30 0.64 0.69 0.72 0.53
50 0.67 0.71 0.73 0.56
70 0.70 0.73 0.75 0.60
Table 1: Corresponding testing accuracies for different rejection options for the simulated dataset

3.4 Tractability of UQ-CHI related to design of prior distribution

Recall that by applying the MED to our optimization problem we no longer learn the model parameter, and instead, we specify the probability distributions. These distributions give rise to penalty functions for the model and the margins via KL-divergence. In detail, the model distribution will give rise to a divergence term

, and the margin distribution will give rise to the divergence term which corresponds to the regularization penalty and the loss function respectively. The trade-off between classification loss and regularization now are on a common probabilistic scale, since both terms are based on probability distributions and KL-divergence. Hence, there is a relationship between defining a prior distribution over margins and parameters and defining the objective function and the penalty term in the original function. Recall that, are the classification margins as slack variables in the optimization which represent the minimum margin that must satisfy. Hence, the choice of the margin distribution corresponds to the use of the slack variables in the formulation of the UQ-CHI. For example, in our case we set and . If we mathematically expand the normalization function in (10), we get the two terms and as shown in (14), and given the choice of margin priors in Section 3.1 we get:

(21)

From (21) we can see that a penalty occurs when the margins are smaller than , and any margins larger than this would not be penalized. The margin distribution becomes peaked when that is when , and this is equivalent to having fixed margins. If the margin values are held fixed the discriminant function might not be able to separate the training examples with such pre-specified margin values. Because of non-separable datasets this will generate an empty convex hull for the solution space. Thus, we need to revisit the setting of the margin values, and the loss function upon them. The parameter will play an almost identical role as the regularization parameter which upper bounds the Lagrange multipliers. Note, if the objective function grow without a bound, it may generate a search space for parameters that are no longer a convex hull. This compromises the uniqueness and solvability of the problem. Therefore, the selection of a prior forms a concave function for a unique optimum in the Lagrange multiplier space.

4 Numerical studies

In this section, we design our simulation studies to evaluate the efficacy of UQ-CHI in terms of prediction and uncertainty quantification, in comparison with the CHI model under a variety of practical scenarios.

4.1 Simulated dataset

We simulate data following the procedure described as follows. The synthetic dataset is generated with two classes with partial labels. We conduct several experiments with the simulated data to investigate the performance of our method across different settings. Without loss of generality, we assume that there are two groups, normal vs. diseased with a proportion of of class normal and of complete labels. For all the experiments, we set the number of features , For each class, we simulate subjects, where we assumed that for .

4.2 Incomplete labels and length of longitudinal data

UQ-CHI can handle partial labels well, i.e., by assigning a prior distribution of the labels and obtaining posterior distributions after model training, in our experiment, we consider a low, medium and high level of label availability, i.e., , and of unlabeled examples. Also, we evaluate our methodology’s robustness in the presence of down-sampling of the training data, i.e., only using a percentage of the data (for example, ranging from , and ), to train both UQ-CHI and CHI models. A model that can predict well with less longitudinal data holds great value in clinical applications.

4.3 Uncertainty quantification with rejection option

As mentioned in 3.3, UQ-CHI has a unique capacity of rejection option. The algorithm rejects to predict on a sample if it cannot be predicted reliably. The key parameter is the threshold that will be used in the rejection option. In our experiments, we use several levels of the threshold to create a range of rejection options from loose to strict, and further calculate the resulting accuracies on the predictions on the accepted samples. Specifically, we vary the size of the rejection region from , , to .

4.4 Parameter tuning and validation

In our experiments, we randomly split the data into two parts, one for training and one for testing. For the training dataset, we use 10-fold cross-validation to tune the parameters. The average accuracies from the split of the testing dataset are reported in the result section. In Section 3.4 we specify under what condition the computation would remain tractable. It has been pointed out that, based on the choice of the margin distribution described in 3.4, is bounded by the parameter . Recall that is a parameter in the prior for the margins. Therefore, the parameter will play an important role. Hence, we conduct experiments with the parameter chosen from to see the impact of various choices of on the testing accuracy.

4.5 Discussion

In the following, we discuss the tractability of the model given the simulated data for various choices of the parameter in Table 2. We simulated different selection of the parameter to check its impact on the testing accuracy. If we observe that increasing this parameter imposes no effect on the performance, we would then ignore the higher values for reasons discussed in Section 3.4. The results show that for a more significant quantity of parameter the accuracy decreases. As shown in Table 2, additional potential terms of the parameter would not carry huge effects as the margin distribution may have become at its peak () which is equivalent to have fixed margins. Note that to test the impact of the parameter we simulated the data with a proportion of of class normal and complete labels. Here we can observe that after increasing the values for the parameter beyond , the performance of the model doesn’t change significantly, which indicates that the margin distribution may have become at its peak, and hence it is equal to a fixed value. Higher values of this parameter generate relatively similar performance. Consequently, lower values of preserve flexibility to estimate a distribution over parameters instead of using fixed margins.

Next, we examine how the incomplete label information would affect the performance of UQ-CHI with regards to the testing accuracy given different sampling ratios in Table 3. A model which can be trained with less training data is more promising in healthcare applications where the data collection is relatively costlier than other real-world applications. The results in Table 3 show that with even a ratio of of incomplete label information the UQ-CHI can perform with a testing accuracy of . This confirms that the model is capable of performing well in the face of lack of label information.

Incorporating a rejection option into the model improves the prediction accuracy of classifiers. There is a general relationship between the testing accuracy and rejection rate: the testing accuracy increases monotonically with increasing rejection rate. The testing accuracies for different rejection options are reported in Table 1. Comparisons of varying rejection rates for the UQ-CHI confirms that for a high rejection rate of , the testing accuracy could go up to for a given label ratio of , which in comparison with a lower rejection rate, this can be a promising result. In Table 1

, we also compared our methodology with CHI framework. Recall that CHI is not strictly a supervised learning problem. In

huang2017chi , both simulation studies and real-world applications demonstrated that without label information, CHI method could still be trained and used to predict. However, we show that the UQ-CHI can generate relativity a better performance than CHI by incorporating the rejection option. UQ-CHI can obtain a testing accuracy in a range of to for a given rejection rate of and a labeling ratio of .

Parameter c Testing accuracy
1.5 81.2
3 80.2
5 79.8
10 77.2
20 77.3
100 76.1
Table 2: Model average testing accuracy () for simulated dataset
Sample ratio Label ratio
Low = 10% Medium = 20% High = 50%
30 0.85 0.033 0.80 0.032 0.74 0.033
50 0.86 0.060 0.83 0.053 0.76 0.027
70 0.88 0.074 0.85 0.041 0.78 0.037
Table 3:

The average classification accuracies and standard deviations (%) for the simulated dataset

5 Real-world application on Alzheimer’s disease

We further test UQ-CHI on an Alzheimer’s disease data which exhibited monotonic disease progression. We use the FDG-PET images of 162 patients (Alzheimer’s Disease: 74, Normal aging: 88) downloaded from the ADNI (www.loni.usc.edu/ADNI). The data is sampled at irregular time points where each patient has at least three time points and at most seven-time points. The data is preprocessed, and the Automated Anatomical Labeling (AAL) is used to segment each image into 116 anatomical volumes of interest (AVOIs). For this study, 90 AVOIs that are in the cerebral cortex are selected (each AVOI becomes a variable here). According to the mechanism of FDG-PET, the measurement data of each region are the local average FDG binding counts, which represents the degree of glucose metabolism. The glucose metabolism declines as the function of aging, and the progression of many neurodegenerative diseases such as AD further accelerates this declination. Thus, ADNI dataset facilitates a perfect application example to test the proposed method. While the ADNI dataset consists of fully labeled examples, we exploit the dataset settings to create a variety of uncertainties to the label information.

The results for tuning the parameter for the ADNI dataset is reported in Table 4. The results show that for a more significant quantity of parameter the accuracy decreases. Table 5 shows the performance of the UQ-CHI across different uncertainty levels as well as different sampling ratios. The proposed method shows an excellent capability to quantify the uncertainties for the real-world dataset. As shown in Table 5, The UQ-CHI is even capable of dealing with a data that has of incomplete labels with an accuracy in the range of for the ADNI dataset.

On the other hand, we show that by only using a small proportion of the training samples as low as of the data, we still can maintain reasonable performance in a range of , which indicates that UQ-CHI can be trained with less training data. The rejection options against the testing accuracy as well as these values against the training ratios are shown in Tables 6. Incorporating a rejection option into the model improves the prediction accuracy of classifiers. Comparisons of different rejection rates for the UQ-CHI confirms that for a high rejection rate of , the testing accuracy could go up to or higher, which compared with a lower rejection rate, this can be a promising result.

Parameter c Testing accuracy
1.5 78.8
3 77.9
5 77.3
10 75.3
20 72
100 68.9
Table 4: Model average testing accuracy () for ADNI dataset
Sample ratio Label ratio
Low = 10% Medium = 20% High = 50%
30 0.82 0.022 0.79 0.052 0.70 0.032
50 0.84 0.014 0.82 0.005 0.74 0.049
70 0.87 0.040 0.83 0.032 0.76 0.043
Table 5: The average classification accuracies and standard deviations (%) for ADNI dataset
Algorithm name UQ-CHI CHI
Label ratio Training ratio Rejection rate
Low = 20 Medium = 40 High = 60
Low = 10 30 0.71 0.76 0.83 0.64
50 0.75 0.78 0.84 0.66
70 0.77 0.79 0.87 0.70
Medium = 20 30 0.67 0.71 0.72 0.58
50 0.70 0.72 0.75 0.62
70 0.71 0.75 0.76 0.63
High = 50 30 0.66 0.70 0.71 0.55
50 0.69 0.71 0.73 0.58
70 0.71 0.72 0.74 0.62
Table 6: Corresponding testing accuracies for different rejection options for the ADNI dataset

6 Conclusion

In this paper, we develop the UQ-CHI method to enable uncertainty quantification for continuous patient monitoring. This probabilistic generalization will facilitate a few extensions to the basic CHI model for decision-making purposes. For example, in many degenerative disease conditions such as AD, it is essential to triage patients to determine the priority of resource allocations and patient care. Therefore, the UQ-CHI framework would equip us with an optimal decision considering imperfect and continuous delivery of knowledge. In the future, we would like to extend this method to other diseases that may show different degradation characteristics in the context of degenerative diseases. Another extension of this methodology is to apply on a non-linear index and further explore the feasibility of varying discriminant functions.

Appendix A Proof to Lemma 3.1

Proof.

By adding a set of dual variables, one for each constraint, the Lagrangian of the optimization problem in (3) can be written as:

(22)

In order to find the solution to Eq. (3), and given the definition of the KL-divergence in (5) we require,

(23)

The solution to the MED optimization problem has the following general form:

(24)

Here, is the normalization constant defined in (10), then the general exponential form of the solution becomes:

(25)

Hence, the dual of the MED problem can be shown in (9).

Appendix B Proof to Lemma 3.3

Proof.

Let be the normalization constant defined in Eq. (10), given the constraints in (8) the normalization constant can be reformulated as follows:

(26a)
(26b)
(26c)
(26d)
(26e)
(26f)

Given the priors in (7), each term in Eq. (26) can be reformulated as follows: For the term in (26d) and (26e) we have the followings:

(27)

And for the last term (26f) we have the following:

(28)

Substituting the results from Eq. (27) and (28) in (26), results in Eq. (14). ∎

Appendix C Proof to Lemma 3.5

Proof.

Given the marginal distribution in (17) and the convex combination of discriminant functions defined as we have the following:

(29)

Where, given the prior distributions, (29) can be written as:

(30)