Nonparametric estimation of utility functions

07/27/2018 ∙ by Mengyang Gu, et al. ∙ ibm Johns Hopkins University 0

Inferring a decision maker's utility function typically involves an elicitation phase where the decision maker responds to a series of elicitation queries, followed by an estimation phase where the state-of-the-art is to either fit the response data to a parametric form (such as the exponential or power function) or perform linear interpolation. We introduce a Bayesian nonparametric method involving Gaussian stochastic processes for estimating a utility function. Advantages include the flexibility to fit a large class of functions, favorable theoretical properties, and a fully probabilistic view of the decision maker's preference properties including risk attitude. Using extensive simulation experiments as well as two real datasets from the literature, we demonstrate that the proposed approach yields estimates with lower mean squared errors. While our focus is primarily on single-attribute utility functions, one of the real datasets involves three attributes; the results indicate that nonparametric methods also seem promising for multi-attribute utility function estimation.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Making decisions under uncertainty using decision theory requires that beliefs about uncertainties be represented by probabilities and preferences over outcomes be summarized by utilities. The latter is often accomplished by assessing a decision maker’s utility function over a single attribute (such as monetary units) or over multiple attributes, depending on the decision situation under consideration.

The vast literature on modeling utility functions describes schemes where the decision maker responds to elicitation queries; these responses are then subsequently used to infer the decision maker’s utility function. We distinguish between these phases and refer to them as elicitation and estimation respectively. Posing elicitation questions in a specific form and then estimating the functional form that best represents the decision maker’s preferences often go hand in hand. In this paper, we contribute to the literature on estimation by introducing a Bayesian nonparametric approach for modeling utility functions. Specifically, we demonstrate the advantages of using a Gaussian stochastic process (henceforth GaSP) over the state-of-the-art, where parametric methods and the nonparametric method of linear interpolation are most popular. We show that linear interpolation corresponds to a subclass of our GaSP model using a specific covariance function in which the process turns to be a Wiener process (Mörters and Peres (2010)); moreover, parametric methods can be incorporated into the GaSP model through the mean function.

While the GaSP model has been popular in domains such as nonlinear regression and classification in machine learning (

Rasmussen and Williams (2006)), spatial statistics (Cressie (1993)]) and computer model emulation and calibration (Sacks et al. (1989)

), it has not been applied to the problem of estimating utility functions, to the best of our knowledge. There is some related work in the artificial intelligence literature on preference modeling with GaSP. For instance,

Chu and Ghahramani (2005) and Guo et al. (2010) use GaSP for preference learning over a set of items. In contrast, our work is on estimating utility functions based on elicitation schemes from the operations research and management science literature (see e.g. Farquhar (1984)).

A fundamental challenge in utility elicitation is that the number of assessed queries is often very small, since elicitation burden increases significantly and responses become less reliable when more questions are posed. The main contribution of this work is a robust estimation approach using GaSP models that reduces predictive errors when there are a small number of observations, in comparison with approaches such as parametric methods and linear interpolation. Furthermore, we provide a coherent model to accurately quantify the uncertainty associated with the utility elicitation process, as compared to the point predictions that are popular in the literature. Another benefit of such a coherent model is that we are able to assess risk attitudes of decision makers more accurately than current adopted methods. Our proposed methods for utility elicitation are available online as an R package (Gu et al. (2018a)).

The remainder of the paper is organized as follows. In sections 2 and 3, we introduce basic notation and terminology, placing our contribution in the context of related literature in operations research and management science on utility elicitation and estimation. Here we motivate our research effort and also discuss some theoretical advantages of nonparametric methods. In section 4, we introduce our GaSP formulation for utility function estimation and also show how to compute derivatives for assessing risk attitude. In section 5, we make a practical case for our proposed method by comparing it with prevalent methods for estimating single-attribute utility functions using a few assessed points on the utility function. We conduct several simulation experiments and also study a real-world dataset (Abdellaoui et al. (2007)). In section 6, we delve into estimating multi-attribute utility functions, demonstrating that GaSP performs better than the estimation methods that were deployed for a three-attribute dataset (Fischer et al. (2000)). Finally, we conclude the paper in section 7.

2 Uncertainty Quantification for Utility Elicitation

Consider outcomes in a separable metric space (e.g. ). Debreu (1954) showed that preferences over uncertain outcomes in such a space are complete, transitive and continuous in iff there exists a continuous utility function representation . Behavioral literature indicates that people tend to construct their preferences during the elicitation process, often providing inconsistent responses to queries (Lichtenstein and Slovic (2006)). The implication is that a decision maker’s utility function should perhaps be considered an approximate representation of their preferences; for this and other reasons we shall discuss shortly, we make the following distinction:

Definition 1 (Noise-free vs. noisy assessment).

When the decision maker answers all the preference elicitation questions consistently with the same underlying (and typically unknown) utility function, the assessment is said to be noise-free, otherwise it is noisy.

The notion of a ‘true’ underlying utility function can be viewed as a theoretical construct and one that has been discussed often in the literature. One way to model the uncertainty in preference elicitation responses is to include a random response error to a systematic component, either as additive error (Laskey and Fischer (1987)) or as random parameters of the utility function (Eliashberg and Hauser (1985), Fischer et al. (2000a)). Another approach is to treat the utility function as inherently stochastic (Becker et al. 1963). In practice, decision analysts are well aware of these issues and have devised measures to counter inconsistencies (see for example Keeney and Raiffa (1976)).

Making a noise-related distinction enables us to express different sources of uncertainty in the elicitation process. One source of uncertainty is prediction uncertainty, representing the system’s uncertainty about the decision maker’s utility at an unassessed . This uncertainty is present regardless of whether decision makers are consistent with their answers. The second source of uncertainty depends on the decision maker. When they consistently answer questions with the same underlying utility function, the elicited utility is identical to at each assessed . We refer to an estimation method that agrees with the elicited utility at each assessed as an interpolator. It is ideal for a method to be an interpolator for a consistent decision maker. Note that the second source of uncertainty only appears when decision makers do not answer questions consistently, leading to noisy assessments.

Preference elicitation for decision making under risk is typically conducted using gambles. Consider gamble that results in outcome with probability and outcome with probability , where . In an elicitation query, the decision maker must evaluate two or more gambles presented to them, e.g. they may be asked to compare gamble with the degenerate gamble . If the assessment is noise-free and if the evaluation is done by expected utility theory (EUT), they prefer the first gamble iff , for some underlying utility function (von Neumann and Morgenstern (1947)). A subsequent estimation task must be performed to infer from the responses.

There is significant empirical evidence from the descriptive literature on prospect theory demonstrating that people tend to overweight low probabilities and underweight high probabilities, thus under prospect theory, the gamble evaluates to , where is a probability weighting function and the reference point is assumed to be (Kahneman and Tversky (1979); Tversky and Kahneman (1992)). Prospect theory also explains observed behavior such as loss aversion and diminishing sensitivity relative to the reference point.

Bleichrodt et al. (2001) and others argue that descriptive violations of expected utility bias utility elicitations, therefore the design of elicitation schemes and estimation of utility functions should be conducted based on prospect theory rather than expected utility theory, resulting in more accurate estimates of . We are sympathetic to this view but as we explain in the next subsection, our research goal is purely that of estimating , provided elicitation responses in the following form:

Definition 2 (Assessed tuples).

Assessed tuples are assessments of points on the utility function, for , .

We propose a GaSP approach to estimating a utility function given assessed tuples as input. Our method can thus be applied to different data sets and is agnostic to the underlying theory. In fact, we demonstrate GaSP estimation using elicitation schemes based on both expected utility theory and prospect theory. Furthermore, it is also agnostic to the generative structure of randomness in the case of probabilistic preferences, i.e. whether the data is generated by an additive noise model, or a utility function with random parameters, or indeed any stochastic utility model. As mentioned earlier, we consider both noise-free and noisy assessments. Note that for noise-free assessments, assessed tuples for underlying utility function .

We refer the reader to Farquhar (1984) for a comprehensive review of preference elicitation schemes involving single attributes; Keeney and Raiffa (1976) also describe elicitation schemes for multi-attribute problems.

3 Utility Estimation

The goal of any estimation task associated with preference elicitation is to use responses to the elicitation queries to infer the decision maker’s utility function . Here we discuss parametric estimation and linear interpolation, highlighting potential limitations.

3.1 Parametric Utility Estimation

Parametric estimation is a popular approach for estimating utility functions involving a single attribute (Eliashberg and Hauser (1985); Kirkwood (2004)). The most common parametric forms are the exponential and power functions. The exponential utility function takes the form sgn exp , where and are constants and sgn is the sign of the risk tolerance parameter . The power utility function is of the form sgn sgn , with constants and , where sgn and sgn are the signs of and . Note that the power function exhibits discontinuous behavior around for ; see Wakker (2008) for details. The limiting cases for the exponential and power functions are the linear and logarithmic functions; these two families of functions are the only ones that satisfy constant risk aversion and constant relative risk aversion respectively (Pratt (1964)).

For multi-attribute utility functions, the most popular approach is to check for utility independence assumptions that decompose the multi-dimensional function into an additive or multiplicative function of one-dimensional marginal utility functions (Keeney and Raiffa (1976)

). These marginal functions typically take the form of the afore-mentioned standard parametric models. There has also been literature on utility functions that do not enforce utility independence assumptions. To account for utility dependence, for instance, parametric forms have also been used for conditional utility functions, i.e. the utility of a subset of attributes given that the other attributes are set at fixed consequences (see for example

Kirkwood (1976)). A more recent approach is to use a copula (Abbas (2009)), which has the benefit that the level of dependency between marginal utility functions can be represented in an easily interpreted parametric way. In our GaSP model, utility dependence between attributes is modeled through a product covariance function, as discussed in section 4.1.

It is not hard to see that a parametric method will not be an interpolator unless the decision maker’s underlying utility function follows the parametric class being used. The lack of this desirable property could result in large elicitation errors when the parametric class is misspecified, as we show in Section 5.1.

3.2 Linear Interpolation

An alternate approach that is popular in the empirical literature on estimating single-attribute utility functions is that of piece-wise linear interpolation across assessed tuples (Abdellaoui (2000); Abdellaoui et al. (2007)). Such an approach is essentially equivalent to the predictive mean of the extended Wiener process, defined as a stochastic process

with independent, normally distributed increments

for with continuous sample paths (Karlin (1975)). A Wiener process is typically defined to have initial value but we relax this assumption. The predictive distribution formalized in the following lemma. (Please see Appendix for all proofs.)

Lemma 1.

Assume , follows a Wiener process. Assume we have observations with . For any , for any and , the predictive distribution of given is

where and

Lemma 1 states that if the utility function is modeled as a Wiener process for a single attribute , the posterior mean for assessing the utility at a point (that has not been assessed) is equivalent to linear interpolation between two nearby assessed tuples.

Figure 1: Interpolation of the function plotted as black curves with 12 assessments equally spaced in (black circles). Predictions by the extended Wiener process (left panel) and the GaSP model (right panel) with the default setting in the RobustGaSP R Package (Gu et al. (2018a)

) are plotted as blue curves . The 95% predictive confidence intervals are shown by the grey area.

Indeed, a Wiener process is a special case of GaSP (which we will formally define in the next section) with initial value , mean zero and covariance function at any , and continuous sample path. However, a Wiener process is not differentiable everywhere and consequently using the posterior mean (i.e. linear interpolation) poses problems for estimating the coefficient of risk aversion.

Example 1 (Linear interpolation and overconfidence).

Consider Figure 1, which displays the function (treated as unknown) whose values are assessed at 12 equally spaced points in . The left panel shows the prediction (blue curves) by the extended Wiener process (for which we do not assume ). Not only does the prediction show discrepancy at places where the derivatives of the function change, it is also clearly overconfident as the 95% confidence interval covers regions that are a lot smaller than the nominal 95%. In comparison, the right panel is the prediction by the method we propose with the same 12 assessed points, using the default setting in the RobustGaSP R Package (Gu et al. (2018a)

). The improvement arises due to the use of other covariance functions with objective Bayesian inference, discussed in the next section in detail. While this example illustrates a shortcoming of linear interpolation using a generic function that is measured at a finite number of values in its domain, the above weakness is relevant for our purposes of estimating a utility function using experimentally assessed tuples.

Another limitation of a Wiener process estimation approach is that it is only defined in a one-dimensional domain and thus is limited to single-attribute utility function estimation. Although there is some literature on interpolation in multi-attribute problems (see for example Bell (1979)

), it is a challenging task due to the curse of dimensionality. A more general GaSP approach with suitable covariance functions built on the space of multiple attributes may be able to better handle the estimation task.

3.3 Quantile-Parameterized Distributions

Quantile-parameterized distributions (QPD) have recently been introduced for modeling uncertainties in decision analysis (Keelin and Powley (2011); Hadlock and Bickel (2017)

). This approach characterizes a continuous probability distribution based on a number of assessed quantile/probability pairs. Although QPDs have not been discussed in the context of utility elicitation, it is straightforward to apply the approach conceptually as assessed tuples are analogous to quantile/probability pairs. Denoting these assessed tuples as

for , where is the assessed utility scaled from , the inverse CDF of a QPD takes the following form:

with left handed limit , right handed limit , and referring to basis functions for , . When the domain of is a real line, a simple choice is the Q-normal distribution, where the basis functions are , , , , with being the normal CDF (Keelin and Powley (2011)). Note that the results depend entirely on the choice of basis functions; further research is required to explore suitable models for utility elicitation. Here we consider QPDs solely as another benchmark for the GaSP model since they provide some flexibility to model a curve ranging from .

4 Estimation with GaSP

We introduce a Bayesian nonparametric method that takes assessed tuples as training data input, regardless of the underlying theory and assumptions used to derive them, and provide an estimated utility function , where could either be a single attribute or multiple attributes. Bayesian inference explicitly provides a formal way to quantify the uncertainty introduced by the estimation through the likelihood and prior, as discussed in detail in this section.

4.1 Model Formulation

To set notation, let

be a vector of

different attributes and let be the utility evaluated at . Let us consider a random utility function modeled in a general regression way with the following form,


where is the mean function, modeled as

where is assumed to be a dimensional domain dependent basis function for any , with unknown regression parameters for each basis function . could be chosen, e.g., as a particular parametric form as specified in Section 3 or as a polynomial function in . For the additive residual term, instead of taking as independent measurement errors as in Eliashberg and Hauser (1985), we model as a stationary GaSP


with variance

and the pair-wise correlation function

. In return, the joint distribution of any

inputs , follows a multivariate normal distribution,


i.e. a normal distribution that is conditional on the unknown variance and the Gram matrix (Rasmussen and Williams (2006)) whose element is . By definition, the covariance of the utility is

for any . If (i.e. single attribute) and , GaSP becomes a Wiener Process when the process is defined on , with initial value . To extend the definition to the case when , the isotropic assumption is sometimes made for modeling a spatial process (Gelfand et al. (2010)), meaning that the correlation function is only a function of where is the Euclidean distance. However, the domain of attributes typically varies on completely different scales (e.g. between the price and comfort of a car), so the effect of the attributes on the correlations will be highly variable. Consequently, the assumption of isotropy may not be reasonable. Instead, the product correlation function is often assumed:


where is a one-dimensional correlation function for the attribute. We list several frequently used correlation functions in Table 1. The difference between the above product correlation function and the isotropic assumption is that for the former, there are parameter(s) in each correlation that can control the smoothness of the utility function on this attribute (which could be learned from the data), while the latter assumes the Euclidean distance.

Power Exponential ,
Rational Quadratic
Table 1: Popular choices of correlation functions, when with for . Here is the roughness parameter, is the range parameter, is the gamma function and is the modified Bessel function of second kind of order .

The power exponential covariance and the Matérn covariance have been used in many applications. When where , Matérn correlation has a closed form. For example, when the roughness parameter , the Matérn correlation is


where . GaSP with the Matérn correlation defined in Equation (4.5) means it is twice mean square differentiable (Rasmussen and Williams (2006)), allowing one to infer the risk attitude using derivatives of the GaSP as discussed later in Section 4.3. We use the Matérn correlation defined in Equation (4.5) for the purpose of demonstration but we do not preclude the use of other correlation functions with suitable differentiable results in future applications. In practice, the roughness parameter () is fixed at a chosen value (e.g. 5/2 for the Matérn), and the correlation function is essentially parameterized by the range parameters ().

4.2 Posterior Predictive Distribution

Denote assessed tuples as , where are points in the domain of the multi-attributes. As we are empirically limited by a small number of assessed utility values, we seek the posterior predictive distribution of the utility function at any input based on the assessed tuples . For simplicity, denote as the assessed utility points in the design. As per the chosen GaSP model, the likelihood is a multivariate normal likelihood:

where is the basis design matrix with element . The model parameters for posterior estimation are the mean parameter , variance parameter and range parameters in the correlation function in Equation (4.5).

We complete the model by specifying reference priors, , for the model parameters (Berger et al. (2001); Paulo (2005); Gu et al. (2018b))


where is the expected Fisher information matrix, given as


where with and is the derivative of with regard to .

After integrating using the above reference prior, the marginal posterior of follows


where the marginal likelihood is

with .

The predictive distribution from a Bayesian approach is obtained by integrating the uncertainty from the parameters


Note that cannot be marginalized out explicitly - there is no prior on that leads to a closed form marginal likelihood after integrating out . Numerical integration algorithms are both inefficient (as the computation requires the inverse of the correlation , which has operations) and less stable. Instead, is often estimated by the maximum marginal posterior mode


As discussed in Gu et al. (2018b), this leads to a robust estimation of , yielding much better results than a non-robust method such as the maximum likelihood estimator (MLE). Note that the derivative of the reference prior is computationally intensive when the sample size is large. One can use some other objective priors that both have the robust estimation properties and the closed form derivatives (see e.g. Gu (2018, 2016)).

With the above setup, the predictive distribution of at a new point , given the assessed tuples and the estimated range parameter , is a t-distribution (Berger et al. (2001))


with degrees of freedom, where , and are given respectively as,

with as the generalized least squares estimator for and being the correlation of the GaSP at and .

The predictive mean, is typically used for estimation of the utility function. The following lemma specifies that such an estimation is an interpolator.

Lemma 2 (GaSP Interpolator).

The predictive mean for any for noise-free assessments with utility function .

For noisy assessments, we do not expect the prediction of GaSP to be exact at the assessed points. To model such situations, an independent noise can be added in Equation (4.1) by defining , where

is independent white noise. Objective Bayesian inference for such a GaSP model is similar to the method discussed above (

Ren et al. (2012); Gu and Berger (2016)). The afore-mentioned methods have been developed as an R package (Gu et al. (2018a)).

4.3 Derivatives of GaSP

Differentiability is an important property of utility functions, e.g., the Arrow-Pratt measure of local risk aversion is given as . Our proposal to use GaSP is helpful in this regard since the derivative processes are also GaSP when the covariance function is mean square differentiable (Rasmussen and Williams (2006)).

Closed form expressions of derivatives of GaSP with Gaussian correlation (i.e. power exponential correlation with in Table 1) are shown in Wang (2012) (chapter 3). Here we present the derivatives of the Matérn class correlation with roughness parameter equal to 2.5. For simplicity, we assume there is only a single attribute in this subsection, but the following result can be easily extended to directional derivative processes with regard to each attribute, i.e. .

Denote and with . Let denote the element-wise product of each entry between two matrices. The following two lemmas present the joint distribution of the GaSP and its derivatives, and they can be derived through directly differentiating the covariance function (see e.g. Golchi et al. (2015)).

Lemma 3.

The joint distribution of GaSP and GaSP derivative on any is:

where , , and , and .

Lemma 4.

The joint distribution of GaSP and GaSP derivative on any is:

where , and and .

Note that the above joint distribution is used to calculate the predictive distribution of the derivative. Indeed, by marginalizing out the mean and variance parameter, the predictive distribution of given and , is a t-distribution


with degrees of freedom, where

The predictive distribution of is similar to Equation (4.12). The point here is to demonstrate that the risk attitude can be obtained using the derivative processes with full assessment of the uncertainty associated with the analysis.

5 Single Attribute Utility Function Estimation

So far we have primarily seen the theoretical benefits of estimating utility functions with GaSP. In this section, we explore practical ramifications on estimating single-attribute utility functions by first performing simulation experiments and then analyzing a real dataset.

5.1 Simulation Experiments

5.1.1 Comparing utility function estimates

In preference elicitation, only a limited number of questions can be posed, thus the number of assessed tuples is typically small: , and are considered for these experiments. Out of sample mean squared error (MSE), , is utilized for comparison, where is the equally spaced held-out point and is used for testing throughout this section. We assume and , where and are lower and upper bounds of the domain in this subsection. We compare the exponential function (Exp), power function (Pow), linear interpolation (LI) and quantile-parameterized distribution (QPD) method with the GaSP model.

For the parametric methods and QPD method, we estimated the parameters with the minimum least squares error. For the QPD, we choose the basis function to be , , , . As the domain of is , the usual Q-normal distribution is not a sensible choice. Instead, we let be a normal distribution truncated at and centered at

with standard deviation

. This seems to perform the best among all basis functions considered.

In the first scenario, we assume that assessments are noise-free. Tables 2 and 3 display the out of sample MSE when the underlying utility function is a power function and an exponential function respectively. Because the underlying utility function is assumed to be the power utility function in Table 2 and the power utility function is a subclass of models contained within the GaSP framework with mean basis and variance , we choose the mean function of the GaSP to be misspecified by selecting an inconsistent mean basis , demonstrating that GaSP performs reasonably well even in this scenario.



Table 2: Out of sample MSE when is assumed to be power with different parameters . The number of observation is assumed to be for the first four rows and for the latter four rows.



Table 3: Out of sample MSE when is assumed to be exponential with different parameters . The number of observation is assumed to be for the first four rows and for the latter four rows.

Tables 2 and 3 both indicate that the GaSP method outperforms the other methods for most of the cases. The discrepancy indicated by MSE of testing points by GaSP is usually several orders of magnitude less than that for parametric fitting, and it is usually ten to hundred times better than LI. Note that we only use and (excluding two boundary points and ), which is reasonably small for elicitation purposes, yet the predictive power achieved by GaSP is already realized. The GaSP method with assessed points even performs better than its competitors with assessed points, meaning that the cost for eliciting the utility function can drop significantly using GaSP method. Moreover, when the sample size increases, the MSE of GaSP decreases, while the MSE for parametric fitting does not necessarily decrease as it is not consistent when the class of parametric utility is misspecified.

The power and exponential families can sometimes be too restrictive, while GaSP is better for these methods because it allows a more flexible structure. The LI and QPD methods also induce a flexible structure. Indeed, they outperform GaSP for two cases when the true utility function is exponential and when the sample size is small (), as the GaSP method uses a misspecified mean function. However, as the sample size increases to for these two cases, GaSP outperforms the other methods by a lot, meaning that GaSP model converges faster than the LI and QPD methods. A more reasonable mean function would, of course, increase the prediction of the GaSP model. However, as for all cases shown in Tables 2 and 3, the mean function of the GaSP model is misspecified, but the GaSP model still shows the best prediction results for most cases. This feature makes it suitable for a default method for estimation of utility functions.

Figure 2: Utility function estimation when the true utility function is power with (upper left panel), (upper right panel) and exponential with (lower right panel), (lower right panel). Black circles represent assessed tuples and black curves are the true utility function. Blue, brown, green and red curves are the fits by the GaSP model, QPD method, exponential and power functions, respectively. The shaded area are the 95% predictive confidence interval by the GaSP model. For almost all cases, the blue and black curves overlap.

The fitted curves for the different methods and for 4 cases are plotted in Figure 2. In almost all cases, the GaSP-fitted functions (blue curves) are close to the true functions (black curves). The GaSP model has very tight 95% confidence intervals in most regions, meaning that it is confident about the prediction. The confidence interval of the GaSP is large when and in right panels in Figure 2, implying that the uncertainty of the GaSP model is high, and the prediction by the GaSP model has comparatively large discrepancy to the real utility function in these two regions. It is an advantage of the GaSP model that it has an internal assessment of the uncertainty, therefore one can determine whether and where to obtain more assessments given a required level of precision.

In the second scenario, we assume noisy assessments, specifically that utilities are random with additive noise,


where is the underlying utility function and is the variance of the Gaussian noise. Since the assessed points are noisy, we simulated experiements to average out the design effect, and calculate the average MSE: . Here the total held-out testing points for each case is thus thus .