The Bayesian paradigm provides a coherent platform to frame the beliefs and the preferences of decision makers (DMs). Once a DM has specified these in the form of a probability distribution and a utility function, then under the subjective expected utility paradigm she would act rationally by choosing a decision that maximizes her expected utility, i.e. the expectation of the utility function with respect to the probability distribution elicited from her. Although other paradigms expressing different canons of rationality exist(e.g. Giang and Shenoy, 2005; Hong and Choi, 2000; Smets, 2002), applied decision making problems have been most commonly addressed within this Bayesian framework (Gómez, 2004; Heckerman, Mamdani and Wellman, 1995).
One of the reasons behind the widespread use of Bayesian methods is the existence of formally justifiably methods that can be used to decompose utility functions and probability distributions into several others, each of which have a smaller dimension than those of a naive representation of the problem. This decomposition offers both computational advantages and more focused decision-making, since the DM only needs to elicit beliefs on small dimensional subsets of variables. This in turn has led to larger and larger problems being successfully and accurately modelled within this Bayesian framework.
The decomposition of the probabilistic part of the world is usually achieved via the notion of conditional independence (Dawid, 1979)
. It was long ago recognized that graphical representations of the relationships between random variables directly express a collection of conditional independences. These independences enabled large dimensional joint probabilities to be formally written as products of local distributions of smaller dimension, needing many fewer probability specifications than a direct, full specification. Many formal statistical graphical models were subsequently defined, most notably Bayesian networks (BNs)(Pearl, 1988; Smith, 2010), that exploited these conditional independences to represent the qualitative structure of a multivariate random vector through a directed graph.
There are also many independence concepts related to utility that can be used to factorize a utility function into terms with a smaller number of arguments. Standard independence concepts are based on the notion of (generalized) additive independence and (conditional) utility independence (Keeney and Raiffa, 1993), These both entail some additive or multiplicative decomposition of the utility function. Fairly recently it has been recognized that sets of such statements could also be represented by a graph, which in turn could be used to develop fast elicitation routines (see e.g. Abbas and Howard, 2005; Abbas, 2009, 2010, 2011; Braziunas and Boutilier, 2005; Engel and Wellman, 2008; Gonzales and Perny, 2004).
The class of influence diagrams (Howard and Matheson, 2005; Nielsen and Jensen, 2009; Smith and Thwaites, 2008) was one of the first graphical methods to contemporaneously depict probabilistic dependence, the form of the utility function and the structure of the underlying decision space. Fast routines to compute expected utilities and identify optimal decisions that exploit the underlying graph have been defined for a long while (e.g. Jensen, Jensen and Dittmer, 1994; Shachter, 1986). However, these are almost exclusively designed to work when the utility can be assumed to factorize additively, i.e. assuming that the utility can be written as a linear combination of smaller dimensional functions over disjoint subsets of the decision problem’s attributes. An exception is the multiplicative influence diagram (Leonelli, Riccomagno and Smith, 2015), whose evaluation algorithm works not only for additive factorizations but also for more general multiplicative ones (Keeney, 1974).
In this paper we develop a class of graphical models that can depict both probabilistic independence and sets of (conditional) utility independence statements expressible by a utility diagram (Abbas, 2010). We call these directed expected utility networks (DEUNs). We here develop two fast algorithms for the computation of expected utilities using these diagrams. The first one applies to any DEUN and consists of a sequential application of a conditional expectation operator, analogous to the chance node removal of Shachter (1986). The second algorithm is valid only for a subset of DEUNs, ones that we call here decomposable. After a transformation into a new junction tree representation of the problem, this routine computes the overall expected utility via variable elimination just as in Jensen, Jensen and Dittmer (1994), but now applied to our much more general family of utilities. We are able to demonstrate that the elimination step in DEUNs almost exactly coincides with that of standard ID’s evaluation algorithms. Therefore both additional theoretical results, as for example approximated propagation, and code already available for IDs, designed originally for use with additive utilities, can be fairly straightforwardly generalized to be used in conjunction with a much more general utility structure.
The motivation for this work stems from a decision support system we are currently building to help local authorities evaluate the impacts of different policies in the light of endemic food poverty (Smith, Barons and Leonelli, 2015a; Smith, Barons and Leonelli, 2015b). In the initial study of Barons, Wright and Smith (2017) - to keep the analysis as simple as possible - the underlying preferential structure was assumed to factorize additively as commonly made in ID modelling and many applied decision analyses. Discussions during the elicitation process however showed that this assumptions was far from ideal in this application. Currently available technology would not enable us to formally perform a decision analysis under the required much milder preferential conditions. We have thus take on this challenge and developed new algorithms for the computation of expected utilities that enable decision makers to perform much more general decision analyses.
The only other attempt in the literature we are aware of to represent utility and probabilistic dependence in a unique graph is the expected utility network of La Mura and Shoham (1999). This is an undirected graphical model with two types of edges to represent probabilistic and preferential dependence. However, this method is built on a non-standard notion of a conditional utility function. Furthermore, fast routines for the computation of the associated expected utility have yet been developed using this framework. In contrast, DEUNs are based on commonly used concepts of utility independences characterised by various preference relationships and so directly apply to standard formulations of decision problems.
The paper is structured as follows. In Section 2 we review the Bayesian paradigm for decision making. In Sections 3 and 4 we review independence concepts and their graphical representations for probabilities and utilities, respectively. In Section 5 we define our DEUN graphical model and in Section 6 we develop algorithms for the computation of the DEUN’s expected utilities. Section 7 presents an application of DEUNs to household food security. We conclude in Section 8 with a discussion.
2 Bayesian decision making
Let be a decision within some set of available decisions, and . Let be an absolutely continuous random vector including the attributes of the problem, i.e. the arguments over which a utility function is defined. For a subset , we let , , where is the sample space of , and denote with and instantiations of and , respectively, . Lastly, let and .
In this paper, we assume the utility function to be continuous and normalized between zero and one so that . In addition, we assume that for each attribute there are two reference values such that for every , where, for a set , .
The expected utility of a decision - the expectation of with respect to the probability density - is then
A rational decision maker would then choose to enact an optimal decision , where .
This framework, though conceptually straightforward, can become very challenging to apply in practice. As soon as the number of attributes grows moderately a faithful elicitation of the probability and utility functions becomes prohibitive. In addition to the knowledge issues in eliciting multivariate functions, the computation of the expected utility in equation (1) requires an integration over an arbitrary large space which, again, may become infeasible in high dimensional settings. For these two reasons various additional models and independence conditions have been imposed. We review these types of conditions in the next two sections.
For ease of notation in the following we leave implicit the dependence of all arguments of functions of interest on the decision . On one hand we can assume that both the probabilistic and the utility independence structure are invariant to the choice of . We note that this is an assumption commonly made in standard influence diagram modelling. Now and may well be functions of - we simply assume that the underlying conditional independence structure and preferential independences are shared by all . But for any finite discrete space , we could alternatively apply our methods under the more general assumption that, for each , the DM’s problem could be depicted by a possibly different network. We could then apply the theory we develop below to each of these networks in turn and finally optimise over these separate evaluations - albeit more slowly.
3 Probability factorizations
The concept used in probabilistic modelling to simplify density functions is conditional independence (Dawid, 1979). For three random variables , and with strictly positive joint density we say that is conditional independent of given , and write , if the conditional density of can be written as a function of and only, i.e. This means that the only information to infer from and is from .
Sets of conditional independence statements can then be depicted by a graph whose vertices are associated to the random variables of interest. We next briefly introduce some terminology from graph theory and then define one of the most common statistical graphical models, namely the Bayesian network.
3.1 Graph theory
A directed graph is a pair , where is a finite set of vertices and
is a set of ordered pairs of vertices, callededges. A directed path of length from to in a graph is a sequence of vertices such that, for any two consecutive vertices and in the sequence, . If there is a directed path from to in we write . We use the symbol if there is no such directed path in . Conversely, an undirected path is a sequence of vertices such that either or . A cycle is a directed path with the additional condition that . For , we say that and are connected if there is an undirected path between and . A graph is connected if every pair of vertices are connected. A directed acyclic graph (DAG) is a directed graph with no cycles. For these graph the labelling of the vertices can be constructed, not uniquely, so that if .
Now let be a DAG. If we say that is a parent of and that is a child of . The set of parents of is denoted by . A vertex of a DAG with no children is called leaf, whilst a root is a vertex with no parents. A DAG is said to be decomposable if all pairs of parents of the same child are joined by an edge. A subset of is a clique of if any pair is connected by an edge and there is no other with the same property such that . Let have cliques and suppose the elements of are ordered according to their indexing. A separator of , , is defined as . The cliques of are said to respect the running intersection property if for at least one , .
The directed graph in Figure 0(a) can be clearly seen to be a DAG with vertex set equal to . This is decomposable since the two parents of vertex 3, i.e. 1 and 2, are connected by an edge. This DAG is also connected since every two vertices are connected by an undirected path. The cliques of the DAG in Figure 0(a) are , and and its separators and . So with this indexing the cliques of this DAG respects the running intersection property.
A graph of interest in this paper is the directed tree . This is a DAG with the following two properties: it has a unique vertex with no parents called root; and all other vertices have exactly one parent. The DAG in Figure 0(b) can be clearly seen to be a directed tree with root and leaves and .
3.2 Bayesian networks
We are now ready to define the statistical graphical model that underpins the probabilistic part of the DEUN model we define below.
A BN over a random vector consists of
conditional independence statements of the form , where ;
a DAG with vertex set and edge set ;
conditional distributions for .
It can be shown (e.g. Lauritzen, 1996) that the density of a BN can then be written as
Consider the DAG in Figure 0(a). A BN with this associated graph implies the conditional independences and . The probability distribution then factorizes as
4 Utility factorizations
4.1 Independence and factorizations
Whilst conditional independence is universally acknowledged as the gold standard to simplify probabilistic joint densities, for utility functions a variety of independence concepts have been used. One very common assumption is that a utility has additively independent attributes implying the additive utility factorization
A second approach for defining multivariate utility factorizations is to first identify utility independences. For this purpose we introduce the conditional utility function of given , ,
where and .
We say that is utility independent of given , , for , if and only if we can write
Utility independences then imply joint utility functions that have a simpler form. Let be a totally ordered set and let, for each , and be the set of indices that precede and follow in , respectively. Let be the set comprising all possible instantiations of , where each element is either or , , and let be an element of . Abbas (2010) showed that, by sequentially applying conditional utility independence statements according to the order of the elements in , any utility function can then be written as
and is the disutility function. So for example if each is utility independent of then equation (3) can be re-expressed as
This special case can be identified as the well-known multilinear utility factorization (Keeney and Raiffa, 1993).
4.2 Utility diagrams
Graphical models depicting various types of preferential independences have now begun to appear. In this paper we consider a specific class of models called utility diagrams (Abbas, 2010).
A utility diagram is a directed graph with vertex set and its edge set is such that the absence of an edge , , implies .
Note that Abbas (2010) defined utility diagrams as bidirectional graphs. However, given that our definition of a directed graph allows vertices to be connected by more than one edge, the model in Definition 3 is equivalent to the one of Abbas (2010), where a bidirected edge between two vertices is replaced by two edges, one pointing in each direction.
A utility diagram with empty edge set corresponds to a multilinear factorization of the utility function as in equation (4). Here we introduce a subclass of utility diagrams that has some important properties.
A utility diagram is said to be directional if its graph is a DAG.
The utility diagram in Figure 2 is directional and implies the following conditional utility independences
Directional utility diagrams have the unique property that their utility function can be written in terms of criterion weights and univariate utility functions only. Although not explicitly depicted by a utility graph, such a property underlies the algorithms developed in Leonelli and Smith (2015) that apply to some specific generalized additively independent models only.
For a directional utility diagram there exists an expansion order over such that equation (3) is a linear combination of terms involving only criterion weights and conditional utility functions having as argument a single attribute.
This result follows by observing that the terms in equation (3) coincide with since the expansion can be performed over all the attributes. These terms are functions of criterion weights. Furthermore the conditional independence structure underlying a directed utility diagram is such that there is an expansion order where UI . Thus in equation (3) is equal to for every .
Focusing on the subclass of directed utility diagrams has the great computational advantage of allowing for the computation of the expected utility of a DEUN through a backward inductive routine. At each step this computes a finite number of integrals over the sample space of one random variable only. More general utility dependence structures could also be studied by extending our methods: see Section 8 for a discussion. However, for simplicity in this paper we restrict ourselves to this special case.
5 Directed expected utility networks
We are now ready to define our graphical model which embeds both probabilistic and utility independence statements.
A directed expected utility network consists of a set of vertices , a probabilistic edge set , denoted by solid arrows, and a utility edge set , denoted by dashed arrows, such that:
is a BN model such that if then ;
is a directional utility diagram such that if then .
Consider the diagrams in Figure 3. Figure 2(a) includes a graph which is not a DEUN since there is a utility edge from to . This edge would make the computation of expected utilities via backward induction impossible. Figures 2(b) and 2(c) are DEUNs since for these is a BN and is a directed utility diagram both including only edges such that . Note that all three diagrams embed the BN in Figure 0(a), whilst only the diagram in Figure 2(c) embeds the utility diagram in Figure 2.
Note that a DEUN is not allowed to contain any cyclical structure in the edge set of the utility diagram. This is because such cycles would inhibit the computation of expected utility through a backward induction procedure where each node is considered individually and sequentially. Of course it may well be possible to develop more general algorithms by merging the vertices that are connected by such a cycle into a single chain component. However, the extended flexibility of having two different edge sets would then need to be offset against the potential loss of both structural information and computational speed.
We next introduce a subclass of DEUNs that entail fast computation routines.
A DEUN is said to be decomposable if
only if in .
The DEUN in Figure 2(b) is not decomposable since but these two vertices are not connected by a directed path in the underlying BN. Conversely the network in Figure 2(c) is decomposable. Note that the semantics of our model permit two vertices to be connected by both probabilistic and utility edges, by just one of the two, or potentially none. So for example and , whilst and .
Just as in the triangulation step for probabilistic propagation (e.g. Lauritzen, 1996), it can be fairly easily showed that any non-decomposable DEUN can be transformed into a decomposable one.
Let be a non-decomposable DEUN with vertex set and edges and . Let be a DEUN with vertex set and edges and , where
Then is decomposable.
This holds by noting that the set simply adds a probabilistic edge connecting two vertices linked by a utility edge which breaks the decomposability condition. The set then simply transform the graph into a decomposable DAG.
For the non-decomposable network in Figure 2(b), the decomposability condition is achieved by simply adding to .
6 Computation of expected utilities
We next consider the computation of expected utilities for both non-decomposable and decomposable DEUNs and define algorithms based on backward inductive routines. All these routines have in common an operation working over vectors of (expected) utility functions that we define next. Let and be the parent sets of with respect to and , respectively. We let be the vector comprising the conditional utilities and disutilities given all possible combinations of the parents at the reference values and .
The vector has as its components
whilst the vector has the utility components
We next introduce an element-wise operation, denoted by , which multiplies an element of one vector, , with any element of another vector, , if these have compatible instantiations, i.e. if the common conditioning variables are instantiated to the same value. So in our Example 8, returns a vector with elements