1 Introduction
Many discrete statistical problems in a variety of domains are nowadays often modeled using Bayesian networks (BNs) Pearl (1988). There are now thousands of practical applications of these models Aguilera et al. (2011); Cano et al. (2004); Heckerman et al. (1995); Jordan (2004), which have spawned many useful technical developments: including a variety of fast exact, approximate and symbolic propagation algorithms for the computation of probabilities that exploit the underlying graph structure Cowell et al. (1999); Dagum and Horvitz (1993); Darwiche (2003). Some of these advances have been hardwired into software Chan and Darwiche (2002); Korb and Nicholson (2010); LowChoy et al. (2012) which has further increased the applicability and success of these methods.
However, BN modeling would not have experienced such a widespread application without tailored methodologies of model validation, i.e. checking that a model produces outputs that are in line with current understanding, following a defensible and expected mechanism French (2003); Pitchforth and Mengersen (2013). Such techniques are now well established for BN models Chen and Pollino (2012); Korb and Nicholson (2010); Pitchforth and Mengersen (2013); Pollino et al. (2007)
. These are especially fundamental for expert elicited models, where both the probabilities and the covariance structure are defined from the suggestions of domains experts, following knowledge engineering protocols tailored to the BN’s bulding process
Neil et al. (2000); Rajabally et al. (2004). We can broadly break down the validation process into two steps: the first concerns the auditing of the underlying graphical structure; the second, assuming the graph represents a user’s beliefs, checks the impact of the numerical elicited probabilities within this parametric family on outputs of interest. The focus of this paper lies in this second validation phase, usually called a sensitivity analysis.The most common investigation is the socalled oneway sensitivity analysis, where the impacts of changes made to a single probability parameter are studied. Analyses where more than one parameter at a time is varied are usually referred to as multiway. In both cases a complete sensitivity analysis for discrete BNs often involves the study of ChanDarwiche (CD) distances Chan and Darwiche (2002, 2004, 2005a) and sensitivity functions Coupé and van der Gaag (2002); van der Gaag et al. (2007). The CD distance is used to quantify global changes. It measures how the overall distribution behaves when one (or more) parameter is varied. A significant proportion of research has focused on identifying parameter changes such that the original and the ‘varied’ BN distributions are close in CD distance Chan and Darwiche (2005a); Renooij (2014). This is minimized when, after a single arbitrary parameter change, other covarying parameters, e.g. those from the same conditional distribution, have the same proportion of the residual probability mass as they originally had. Sensitivity functions, on the other hand, model local changes with respect to an output of interest. These describe how that output probability varies as one (or potentially more) parameter is allowed to be changed. Although both these concepts can be applied to generic Bayesian analyses, they have almost exclusively been discussed and applied only within the BN literature (see Chan and Darwiche (2005b); Charitos and van der Gaag (2006a, b); Renooij (2012) for some exceptions). This is because the computations of both CD distances and sensitivity functions are particularly straightforward for BN models.
In this paper we introduce a unifying comprehensive framework for certain multiway analyses, usually called in the context of BNs single full conditional probability table (CPT) analyses  where one parameter from each CPT of one vertex of a BN given each configurations of its parents is varied. Using the notion of an interpolating polynomial Pistone et al. (2001) we are able to describe a large variety of models based on their polynomial form. Then, given this algebraic carachterization, we demonstrate that oneway sensitivity methods defined for BNs can be generalized to single full CPT analyses for any model whose interpolating polynomial is multilinear, for example contextspecific BNs Boutilier et al. (1996) and chain event graphs Smith and Anderson (2008). Because of both the lack of theoretical results justifying their use and the increase in computational complexity, multiway methods have not been extensively discussed in the literature: see Bolt and van der Gaag (2015); Chan and Darwiche (2004); GómezVillegas et al. (2013) for some exceptions. This paper aims at providing a comprehensive theoretical toolbox to start applying such analyses in practice.
Importantly, our polynomial approach enables us to prove that single full CPT analyses in any multilinear polynomial model are optimal under proportional covariation in the sense that the CD distance between the original and the varied distributions is minimized. The optimality of this covariation method has been an open problem in the sensitivity analysis literature for quite some time Chan and Darwiche (2004); Renooij (2014). However, we are able to provide further theoretical justifications for the use of proportional covariation in single full CPT analyses. We demonstrate below that for any multilinear model this scheme minimizes not only the CD distance, but also any divergence in the family of divergences Ali and Silvey (1966); Csiszár (1963). The class of divergences include a very large number of divergences and distances (see e.g. Pardo (2005) for a review), including the famous KullbackLeibler (KL) divergence Kullback and Leibler (1951). The application of KL distances in sensitivity analyses of BNs has been almost exclusively restricted to the case when the underlying distribution is assumed Gaussian GómezVillegas et al. (2007, 2013), because in discrete BNs the computation of such a divergence requires more computational power than for CD distances. We will demonstrate below that this additional complexity is a feature shared by any divergence in the family of divergences.
However, by studying sensitivity analysis from a polynomial point of view, we are able to consider a much larger class of models for which such methods are very limited. We investigate the properties of oneway sensitivity analysis in models whose interpolating polynomial is not multilinear, which are usually associated to dynamic settings where probabilities are recursively defined. This difference gives us an even richer class of sensitivity functions as shown in Charitos and van der Gaag (2006a, b); Renooij (2012) for certain dynamic BN models, which are not simply linear but more generally polynomial. We further introduce a procedure to compute the CD distance in these models and demonstrate that no unique updating of covarying parameters lead to the smallest CD distance between the original and the varied distribution.
The paper is structured as follows. In Section 2 we define interpolating polynomials and demonstrate that many commonly used models entertain a polynomial representation. In Section 3 we review a variety of divergence measures. Section 4 presents a variety of results for single full CPT sensitivity analyses in multilinear models. In Section 5 the focus moves to nonmultilinear models and oneway analyses. We conclude with a discussion.
2 Multilinear and polynomial parametric models
In this section we first provide a generic definition of a parametric statistical model together with the notion of interpolating polynomial. We then categorize parametric models according to the form of their interpolating polynomial and show that many commonly used models fall within two classes.
2.1 Parametric models and interpolating polynomials
Let
be a random vector with an associated discrete and finite sample space
, with . Although our methods straightforwardly applies when the entries of are random vectors, for ease of notation, we henceforth assume its elements are univariate.Denote by the vector of values of a probability mass function which depends on a choice of parameters . The entries of are called atomic probabilities and the elements atoms.
A discrete parametric statistical model on atoms is a subset of the dimensional probability simplex, where
(1) 
is a bijective map identifying a particular choice of parameters with one vector of atomic probabilities. The map is called a parametrisation of the model.
The above definition is often encountered in the field of algebraic statistics, where properties of statistical models are studied using techniques from algebraic geometry and commutative computer algebra, among others Drton et al. (2009); Riccomagno (2009). We next follow Görgen and Smith (2015) in extending some standard terminology.
A model has a monomial parametrisation if
where denotes a vector of exponents and is a monomial. Then equation (1) is a monomial map and , for all . Here is the set of indeterminates and is the polynomial ring over the field .
For models entertaining a monomial parametrisation the network polynomial we introduce in Definition 2.1 below concisely captures the model structure and provides a platform to answer inferential queries Darwiche (2003); Görgen et al. (2015).
The network polynomial of a model with monomial parametrisation is given by
where is an indicator function for the atom . Probabilities of events in the underlying sigmafield can be computed from the network polynomial by setting equal to one the indicator function of atoms associated to that event. In the following it will be convenient to work with a special case of the network polynomial where all the indicator functions are set to one. The interpolating polynomial of a model with monomial parametrisation is given by the sum of all atomic probabilities,
where .
2.2 Multilinear models
In this work we will mostly focus on parametric models whose interpolating polynomial is multilinear. We say that a parametric model is multilinear if its associated interpolating polynomial is multilinear, i.e. if .
We note here that a great portion of wellknown nondynamic graphical models are multilinear. We explicitly show below that this is the case for BNs and contextspecific BNs Boutilier et al. (1996). In Görgen et al. (2015) we showed that certain chain event graph models Smith and Anderson (2008) have multilinear interpolating polynomial. In addition, decomposable undirected graphs and probabilistic chain graphs Lauritzen (1996)
can be defined to have a monomial parametrisation whose associated interpolating polynomial is multilinear. An example of models non entertaining a monomial parametrisation in terms of atomic probabilities are nondecomposable undirected graphs, since their joint distribution can be written as a rational function of multilinear functions
Chan and Darwiche (2005b).2.2.1 Bayesian networks
For an , let . We denote with ,
, a generic discrete random variable and with
its associated sample space. For an , we let and . Recall that for three random vectors , and , we say that is conditional independent of given , and write , if . A BN over a discrete random vector consists of
conditional independence statements of the form , where ;

a directed acyclic graph (DAG) with vertex set and edge set ;

conditional probabilities for every , and .
The vector , , includes the parents of the vertex , i.e. those vertices such that there is an edge in the DAG of the BN.
From Chan and Darwiche (2004) we know that for any atom its associated monomial in the network polynomial can be written as
where denotes the compatibility relation among instantiations.
Lemma 1.
From Equation (2) we can immediately deduce the following.
Proposition 1.
A BN is a multilinear parametric model, whose interpolating polynomial is homogeneous with monomials of degree .
Suppose a newborn is at risk of acquiring a disease and her parents are offered a screening test () which can be either positive () or negative (). Given that the newborn can either severely () or mildly () contract the disease or remain healthy (), her parents can then decide whether or not to give her a vaccine to prevent a relapse ( and , respectively). We assume that the parents’ decision about the vaccine does not depend on the screening test if the newborn contracted the disease, and that the probability of being severely or mildly affected by the disease is equal for negative screening tests.
The above situation can be described, with some loss of information, by the BN in Figure 1, with probabilities, for and ,
Its associated interpolating polynomial has degree and equals
2.2.2 Contextspecific Bayesian networks
In practice it has been recognized that often conditional independence statements do not hold over the whole sample space of certain conditioning variables but only for a subset of this, usually referred to as a context. A variety of methods have been introduced to embellish a BN with additional independence statements that hold only over contexts. A BN equipped with such embellishments is usually called contextspecific BN. Here we consider the representation known as context specific independence (CSI)trees and introduced in Boutilier et al. (1996).
Consider the medical problem in Example 2.2.1. Using the introduced notation, we notice that by assumption, for each , the probabilities are equal for all and called . Similarly, are equal and called , . Also are equal and called . The first two constraints can be represented by the CSItree in Figure 2, where the inner nodes are random variables and the leaves are entries of the CPTs of one vertex. The tree shows that, if or then no matter what the value of is, the CPT for will be equal to and respectively.
The last constraint cannot be represented by a CSItree and is usually referred to as a partial independence (Pensar et al., 2016). In our polynomial approach, both partial and contextspecific independences can be straightforwardly imposed in the interpolating polynomial representation of the model. In fact the interpolating polynomial for the model in this example corresponds to the polynomial in equation (2) where the appropriate indeterminates are substituted with , and . This polynomial is again multilinear and homogeneous, just like for all contextspecific BNs embellished with CSItrees and partial independences.
We notice here that the interpolating polynomial of a multilinear model is not necessarily homogenous, as for example the one associated to certain chain event graph models, as shown in Görgen et al. (2015).
2.3 Nonmultilinear models
Having discussed multilinear models, we now introduce more general structures which are often encountered in dynamic settings. Although many more models have this property, for instance dynamic chain graphs Anacleto and Queen (2013) and dynamic chain event graphs Barclay et al. (2015), for the purposes of this paper we focus here on the most commonly used model class of dynamic Bayesian networks (DBNs) (Murphy, 2002). In Görgen et al. (2015) we showed that the socalled non squarefree chain event graph is also a nonmultilinear model.
2.3.1 Dynamic Bayesian networks
DBNs extend the BN framework to dynamic and stochastic domains. As often in practice, we consider only stationary, feedforward DBNs respecting the first order Markov assumption with a finite horizon , see e.g. (Koller and Lerner, 2001). This assumes that probabilities do not vary when shifted on time (stationarity), that current states only depend on the previous time point (firstorder Markov assumption) and that contemporaneous variable cannot directly affect each other (feedforward). These DBNs can be simply described by an initial distribution over the first time point and a BN having as vertex set two generic time slices. Such latter BN is usually called 2Time slice Bayesian Network (2TBN). Let be a time series. A 2TBN for a time series is a BN with DAG such that and its edge set is such that there are no edges , , , .
A DBN for a time series is a pair , such that is a BN with vertex set , and is a 2TBN such that its vertex set is equal to .
Consider the problem of Example 2.2.1 and suppose the newborn can acquire the disease once a year. Suppose further that the screening test and the vaccine are available for kids up to four years old. This scenario can be modeled by a DBN with time horizon , where , , , corresponds to the variable of Example 2.2.1 measured in the th year. Suppose that the probabilities of parents choosing the screening test and vaccination depend on whether or not the newborn acquired the disease in the previous year only. Furthermore, there is evidence that kids have a higher chance of contracting the disease if they were sick the previous year, whilst a lower chance if vaccination was chosen. This situation can be described by the DBN in Figure 3 where at time the correlation structure of the nondynamic problem is assumed.
For a finite time horizon , the interpolating polynomial has monomials each of degree . To show that this polynomial is not multilinear, consider the event that the screening test is always positive, that the parents always decline vaccination and that the newborn gets mildly sick in her first three years of life, denoted as . Let the parameters for the first time slice be denoted as in Example 2.2.1 and denote for
The interpolating polynomial for this event equals
(3) 
which has indeterminates of degree 3 and 2 and therefore is not multilinear.
Note that in the example above indeterminates can have degree up to to , since this corresponds to the longest length of a path where the visited vertices can have probabilities that are identified in the ‘unrolled’ version of the DBN, i.e. one where the 2TBN graph for time is recursively collated to the one of time . From this observation the following follows.
Proposition 2.
A DBN is a parametric model with monomial parametrisation, whose interpolating polynomial is homogeneous and each indeterminate can have degree lower or equal to .
As for multilinear models, the interpolating polynomial of a nonmultilinear model can be nonhomogeneous. This is the case for example for certain non squarefree chain event graphs.
3 Divergence measures
In sensitivity analyses for discrete parametric statistical models we are often interested in studying how far apart from each other are two vectors of values of two probability mass functions and from the same model
. Divergence measures are used to quantify this dissimilarity between probability distributions. In this section we provide a brief introduction to these functions within the context of our discrete parametric probability models.
A divergence measure within a discrete parametric probability model is a function such that for all :

;

iff .
The larger the divergence between two probability mass functions and , the more dissimilar these are. Notice that divergences are not formally metrics, since these do not have to be symmetric and respect the triangular inequality. We will refer to divergences with these two additional properties as distances.
The divergence most commonly used in practice is the KL divergence Kullback and Leibler (1951). The KL divergence between , , is defined as
(4) 
assuming for all . Notice that the KL divergence is not symmetric and thus in general. However both divergences can be shown to be a particular instance of a very general family of divergences, called divergences Ali and Silvey (1966); Csiszár (1963). The divergence between , , is defined as
(5) 
where is the class of convex functions , , such that , and . So for example for and for . Many other renowned divergences are in the family of divergences: for example divergences Jeffreys (1946) and total variation distances (see Pardo (2005) for a review).
The distance usually considered to study the dissimilarity of two probability mass functions in sensitivity analyses for discrete BNs is the aforementioned ChanDarwiche distance. This distance is not a member of the divergence family. The CD distance between , , is defined as
(6) 
where is defined as 1. It has been noted that in sensitivity analysis in BNs, if one parameter of one CPT is varied, then the CD distance between the original and the varied BN equals the CD distance between the original and the varied CPT Chan and Darwiche (2005a). This distributive property, and its associated computational simplicity, has lead to a wide use of the CD distance in sensitivity studies in discrete BNs.
4 Sensitivity analysis in multilinear models
We can now formalize sensitivity analysis techniques for multilinear parametric models. We focus on an extension of single full CPT analyses from BNs to generic multilinear models. Standard oneway sensitivity analyses can be seen as a special case of single full CPT analyses when only one parameter is allowed to be varied. We demonstrate in this section that all the results about oneway sensitivity analysis in BN models extend to single full CPT analyses in multilinear parametric models and therefore hold under much weaker assumptions about the structure of both the sample space and the underlying conditional independences. Before presenting these results we review the theory of covariation.
4.1 Covariation
In oneway analyses one parameter within a parametrisation of a model is varied. When this is done, then some of the remaining parameters need to be varied as well to respect the sumtoone condition, so that the resulting measure is a probability measure. In the binary case this is straightforward, since the second parameter will be equal to one minus the other. But in generic discrete finite cases there are various considerations the user needs to take into account, as reviewed below.
Let be the parameter varied to and suppose this is associated to a random variable in the random vector . Let be the subset of the parameter set including describing the probability distribution of and whose elements need to respect the sum to one condition. For instance would include the entries of a CPT for a fixed combination of the parent variables in a BN model or the entries of a CPT associated to the conditional random variable from a leaf of a CSItree as in Figure 2. Suppose further these parameters are indexed according to their values, i.e. . From Renooij (2014) we then have the following definition. Let be varied to . A covariation scheme is a function that takes as input the value of both and and returns an updated value for denoted as .
Different covariation schemes may entertain different properties which, depending on the domain of application, might be more or less desirable. We now list some of these properties from Renooij (2014). In the notation of Definition 4.1, a covariation scheme is

valid, if ;

impossibility preserving, if for any parameter , , we have that ;

order preserving, if ;

identity preserving, if , ;

linear, if , for and .
Of course any covariation scheme needs to be valid, otherwise the resulting measure is not a probability measure and any inference from the model would be misleading. Applying a linear scheme is very natural: if for instance , then and the scheme assigns a proportion of the remaining probability mass to the remaining parameters. Following Renooij (2014) we now introduce a number of frequently applied covariation schemes. In the notation of Definition 4.1, we define

the proportional covariation scheme, , as

the uniform covariation scheme, , for , as

the order preserving covariation scheme, , for , as
where is the upper bound for and is the original mass of the parameters succeeding in the ordering.
Table 1 summarizes which of the properties introduced in Definition 4.1 the above schemes entertain (see (Renooij, 2014) for more details). Under proportional covariation, to all the covarying parameters is assigned the same proportion of the remaining probability mass as these originally had. Although this scheme is not order preserving, it maintains the order among the covarying parameters. The uniform scheme on the other hand gives the same amount of the remaining mass to all covarying parameters. In addition, although the order preserving scheme is the only one that entertains the order preserving property, this limits the possible variations allowed. Note that this scheme is not only simply linear, but more precisely piecewise linear, i.e. a function composed of straightline sections. All the schemes in Definition 4.1 are domain independent and therefore can be applied with no prior knowledge about the application of interest. Other schemes, for instance domain dependent or nonlinear, have been defined, but these are not of interest for the theory we develop here.
Scheme/Property  valid  imppres  ordpres  identpres  linear 

Proportional  ✓  ✓  ✗  ✓  ✓ 
Uniform  ✓  ✗  ✗  ✗  ✓ 
Order Preserving  ✓  ✓  ✓  ✓  ✓ 
4.2 Sensitivity functions
We now generalize oneway sensitivity methods in BNs to the single full CPT case for general multilinear models. This type of analysis is simpler than other multiway methods since the parameters varied/covaried never appear in the same monomial of the BN interpolating polynomial. So we now find an analogous CPT analysis in multilinear models which has the same property. Suppose we vary parameters and denote by , , the set of parameters including and associated to the same (conditional) random variable: thus respecting the sum to one condition. Assume these sets are such that . Note that a collection of such sets can not only be associated to the CPTs of one vertex given different parent configurations, but also, for instance, to the leaves of a CSItree as in Figure 2 or to the positions along the same cut in a CEG Smith and Anderson (2008).
We start by investigating sensitivity functions. These describe the effect of the variation of the parameters on the probability of an event of interest. A sensitivity function equals the probability and is a function in , where are varied to . Our parametric definition of a statistical model enables us to explicitly express these as functions of the covariation scheme for any multilinear model. Recall that and let . Let be the subsets of and respectively including the exponents where the entry associated to an indeterminate in is not zero, and be the subsets including the exponents such that the entry relative to is not zero, . Formally
Let be the sets including the elements in and , respectively, where the entry relative to is deleted. Lastly, let .
Proposition 3.
Consider a multilinear model where the parameters are varied to and is covaried according to a valid scheme , , . The sensitivity function can then be written as
(7) 
Proof.
The probability of interest can be written as
The result follows by substituting the varying parameters with their varied version. ∎
From Proposition 3 we can deduce that for a multilinear model, under a linear covariation scheme, the sensitivity function is multilinear.
Corollary 1.
Under the conditions of Proposition 3 and the linear covariation schemes , the sensitivity function equals
(8) 
where
(9) 
Proof.
The result follows by substituting the definition of a linear covariation scheme into equation (7) and then rearranging. ∎
Therefore, under a linear covariation scheme, the sensitivity function is a multilinear function of the varying parameters , . This was long known for BN models Castillo et al. (1997); Renooij (2014); van der Gaag et al. (2007). However, we have proven here that this feature is shared amongst all models having a multilinear interpolating polynomial. In BNs the computation of the coefficients and is particularly fast since for these models computationally efficient propagation techniques have been established. But these exist, albeit sometimes less efficiently, for other models as well (see e.g. (Cowell et al., 1999) for chain graphs). Within our symbolic definition, we note however that once the exponent sets , , are identified, then one can simply plugin the values of the indeterminates to compute these coefficients.
We now deduce the sensitivity function when parameters are varied using the popular proportional scheme.
Corollary 2.
Proof.
For a proportional scheme the coefficients in the definition of a linear scheme equals and . By substituting these expressions into equation (9) we have that
By noting that the result then follows. ∎
It is often of interest to investigate the posterior probability of a target event
given that an event has been observed, . This can be represented by the posterior sensitivity function describing the probability as a function of the varying parameters .Corollary 3.
Under the conditions of Corollary 1, a posterior sensitivity function can be written as the ratio
(10) 
where , , and .
Proof.
The result follows from equation (8) and by noting that . ∎
The form of the coefficients in Corollary 3 can be deduced by simply adapting the notation of equation (7) to the events and for the numerator and the denominator, respectively, of equation (10). Sensitivity functions describing posterior probabilities in BNs have been proven to entertain the form in equation (10). Again, Corollary 3 shows that this is so for any model having a multilinear interpolating polynomial.
,  ,  ,  , 
,  ,  ,  . 
Suppose the contextspecific model definition in Example 2.2.2 is completed by the probability specifications in Table 2. Suppose we are interested in the event that parents do not decide for vaccination. Figure 4 shows the sensitivity functions for this event when (on the xaxis) and (on the yaxis) are varied and the other covarying parameter are changed with different schemes. We can notice that for all schemes the functions are linear in their arguments and that more precisely for an orderpreserving scheme the sensitivity function is piecewise linear. Notice that whilst uniform and proportional covariation assigns similar, although different, probabilities to the event of interest, under orderpreserving covariation the probability of interest changes very differently from the other schemes: a property we have often observed in our investigations.
4.3 The ChanDarwiche distance
Whilst sensitivity functions study local changes, CD distances describe global variations in distributions Chan and Darwiche (2005a). These can be used to study by how much two vectors of atomic probabilities vary in their distributional assumptions if one arises from the other via a covariation scheme. We are then interested in the global impact of that local change.
We next characterize the form of the CD distance for multilinear models in single full CPT analyses, first generalizing its form, again derived in Renooij (2014) for BN models. We demonstrate that the distance depends only on the varied and covaried parameters: thus very easy to compute.
Proposition 4.
Let , where is a multilinear parametric model and arises from by varying to and to
Comments
There are no comments yet.