1 Introduction
At an abstract level, an individual (also referred to as a reasoner) is faced with a domain where by “domain” we simply mean a collection of propositions or concepts which are mathematically encoded as Random Variables (RVs). To arrive at the complete probabilistic knowledge of the domain, i.e., to learn how all RVs in the domain probabilistically interact with one another, is indeed a demanding task. In reality, an individual is often faced with a domain for which she merely possesses
partial knowledge—that is, she only knows how some (not all) RVs in the domain interact. To make the setting under study more tangible, consider the following case. Suppose that the probabilistic knowledge of a domain is represented by a Probabilistic Graphical Model (PGM), e.g., a Bayesian Network (BN). Then the reasoner comes across a new RV, say
, and would like to incorporate it into so as to achieve the complete probabilistic knowledge of the new domain (which now also includes ). However, incorporation of into would require knowledge of how is probabilistically related to all the RVs already present in ; a knowledge which may be, quite plausibly, unavailable to the reasoner. An interesting question that now arises is how to handle situations where only partial knowledge as to how is probabilistically related to is available. An example would be when the reasoner merely knows how interacts probabilistically with only one RV, say , in .In this paper, a graphical model, namely, the MultiContext Model (MCM) is proposed to represent the setting in which only partial probabilistic knowledge of a domain is available to the reasoner. More specifically, MCM is a graphical language to represent settings in which the Joint Probability Distribution (JPD) over all RVs is not available, but what is available instead is the JPDs over a collection of subsets of RVs of the domain (referred to as subdomains or
contexts). These contexts are potentially overlapping, i.e., they could share some RVs. As pointed out elegantly in [Pearl1990], “this state of partial knowledge is more common, because we often begin thinking about a problem through isolated frames, paying no attention to interdependencies.” Along the same line of thought, it is plausible to assume that the probabilistic knowledge of the domain at the early primitive stage consists of a collection of disjoint contexts and as the reasoner acquires more knowledge as to how the variables in the model are related to one another and thus probabilistically interact, contexts gradually go through a process very much like an evolution: contexts start to share some variables, overlaps begin to emerge and, once enough knowledge is obtained, a number of contexts could merge thereby giving rise to bigger contexts. This naturally raises the following fundamental question: How could a collection of consistent, probabilistically sound, and potentially overlapping contexts emerge gradually over the course of time? In an attempt to answer this question we present a generative process of constructing a contradictionfree MCM. Finally, we would like to note that the special case where the whole domain is modeled as a single context corresponds to the conventional way of modeling the probabilistic knowledge of a domain using a single PGM, e.g., by some BN.Another yet crucial question which we address in this work—which is another motivation behind the development of the MCM—is how the task of inference (i.e., the evaluation of some probability of interest which is hereafter referred to as query) should be carried out in a domain which is modeled according to some MCM. A query does not necessarily belong to any one of the contexts in particular and, in fact, may involve RVs from different contexts.
The paper is structured as follows. After introducing the notation in Sec. 2, we define in Sec. 3 the MCM and drawing on the notion of probabilistic conditioning, a generative process of constructing a contradictionfree MCM is discussed. Then, in Sec. 4 we elaborate on the problem of inference in a multicontext setting, i.e., in a domain whose probabilistic knowledge is encoded as an MCM. In Sec. 5 we discuss the relevant past work and comment on the proposed model. Finally, Sec. 6 concludes the paper.
2 Terminology and Notation
In this section we present the mathematical notation and the terminology employed in this paper. Random quantities are denoted by boldfaced letters; their realizations are denoted by the same letter but nonbold. More specifically, RVs are denoted by lowercase boldfaced letters, e.g., x
, while random vectors are denoted by uppercase bold letters, e.g.,
X. denotes the set of values a random quantity can take, e.g., is the set of all possible realizations of the RV x. In this paper, we assume that all random quantities are discrete.The JPD over the RVs is denoted by ; when comprise a vector X then . We will use the notation to denote the sequence of RVs . To simplify presentation and to prevent our expressions from becoming cumbersome, we incur the following abuse of notation: We denote the probability by for some RV x and its realization . Also, for some , i.e., is the probability that x takes on any value other than . For conditional probabilities we will use the notation instead of . Similar notations will be used for the case of random vectors, i.e., , , and .
The subscript on a probability, e.g., , denotes the minimum value the probability can take subject to the constraints induced by the available probabilistic knowledge. Likewise, the subscript on a probability denotes the maximum value the probability can take. Finally, the operator gives the positive part of its argument, i.e., for any realvalued .
3 MultiContext Model
As explained earlier, a domain is simply the set of all Random Variables (RVs) at hand. A context comprises a collection of RVs for which their JPD is precisely known, see Fig. 1(a). In general, two contexts could be disjoint (Fig. 1(b)) or overlapping (Fig. 1(c)).
A MultiContext Model (MCM) encodes the probabilistic knowledge of a domain as a collection of possibly overlapping contexts. This enables the handling of situations in which comprehensive knowledge of a domain is not available, but partial information is, in the form of JPDs of some subsets of the domain. Let us first motivate the proposed MCM by entertaining a simple yet enlightening example.
3.1 Motivating Example
Consider a domain consisting of the RVs in addition to a set of RVs, . A reasoner has formed a partial belief as to the probabilistic connections between the variables of the domain. More specifically, the reasoner knows precisely the JPDs and but not the JPD . This setting is described by an MCM that consists of two disjoint contexts, one associated to RVs and the other to , as shown in Fig. 2.
Assume that the following query is posed: Given the available information, what could be said about for some ? The RVs y and belong to different contexts, therefore, the JPD of y and , , is not available. The best one can hope for is to derive the range within which varies, namely,
. Let us for the moment assume the objective is to find
. Based on the conventional methodology, i.e., the approach adopted by past work (cf. [Andersen and Hooker1990, Andersen and Hooker1994, Hansen et al.1995] and references therein) one has to write down allthe information as a list of linear equations and solve it as a Linear Program (LP). The main drawback of the conventional approach is that it cannot distinguish between what information is relevant and what is irrelevant for the posed query, and hence what needs to and what need not be considered in answering the query. The price for this is that the number of parameters required to merely formulate the query as an LP is exponential in
.The key point, however, is that what information is relevant (or irrelevant) depends directly on the posed query, i.e., it is querydependent. The main advantage of the proposed MCM over previous approaches is that it enables answering a query in a computationally efficient manner by distinguishing the relevant information from the irrelevant for the given query. This is realized thorough adopting the notion of inference grammar; a concept which will be systematically defined later. For our example, following the inference rule we will provide in Sec. 4.2, one can easily get .
The task of inference in an MCM is carried out on two different levels, which makes the task more computationally efficient:

HighLevel Reasoning: at this level, through the use of inference grammar, the relevant quantities are identified (e.g., and in the case of our example).

LowLevel Reasoning: the relevant quantities, identified in (i), can then be computed by employing inference algorithms which take advantage of the potentially rich independence structure governing the contexts. For example, it could very well be the case that for the JPD associated to a large number of conditional independence relations hold. In that case, stating the derivation of (i.e., ) as an LP would be computationally inefficient^{1}^{1}1The number of parameters required just to state the problem as an LP is exponential in . but unnecessary. Indeed, the task of finding could be accomplished in a computationally efficient way using one of the many inference methods developed for probabilistic graphical models; a key point that the previous approaches do not take advantage of.
As a final step, in order to derive the lower/upper bound to the posed query, the quantities identified in (i) and subsequently calculated in (ii) are stated and solved as an LP.
3.2 Generative Process of ContradictionFree Mcms
The objective of the generative process we describe in this section is to provide a way to consistently^{2}^{2}2That is, without introducing any form of contradictory result with respect to any probability assignment. construct contexts, in a sequential manner, over a set of RVs. The act of constructing a context, i.e., of assigning a JPD to a subset of RVs, corresponds to forming a subjective^{3}^{3}3One must not interpret the subjectivity of belief as “total disconnectivity from the reality.” Thus, we adopt the Bayesian interpretation of probability in this section. The avid reader is referred to [Chalmers1976]. An adherent to the frequentist interpretation of probability could think of contexts as being empirically constructed from a collection of data and thus skip Sec. 3.2 and proceed directly to the next section. belief over those RVs. In this light, the act of constructing multiple contexts corresponds to gradually forming subjective beliefs over a number of subsets of variables in the domain; hence every context symbolizes an established belief over the RVs involved in that context.
We introduce this problem by considering a simple case shown in Fig. 3(a).
Suppose there are three RVs, namely, and z, present in the domain and let us consider the following question: Could one assign and freely and gradually in a consistent manner, over the three variables without introducing any sort of contradiction? It is easy to verify that the answer is positive. Indeed, one could start off by assigning . This assignment would, of course, induce the marginal and one can write . Then, to complete this task, one would just need to proceed with assigning . This process could be referred to as a generative process of the assignment of and over , and z without introducing any inconsistencies, in a gradual manner. Indeed, freeassignment refers to the act of freely assigning the noninduced, e.g., , part of the tobeformed belief, e.g., . In other words, freeassignment signifies the observation that the alreadyformed belief does not impose any constraints on the noninduced part of the tobeformed belief.
Let us now consider the case shown in Fig. 3(b). Could one assign and freely and gradually in a consistent manner over the three variables without introducing any sort of contradiction? After some investigation, one can see that the answer is negative [Pearl1985]. Not surprisingly, the reason for this has to do with the existence of a loop in the model: once and are assigned^{4}^{4}4 is induced by the assignment of ., then cannot be assigned freely. This is due to the fact that has to satisfy some nontrivial conditions imposed by the already assigned contexts and [Pearl1985].
In summary, whenever it comes to generating a new context, the JPD associated to that context has to be separated into two parts: (i) the part induced by the already existing contexts, and (ii) the part containing new variables which have never been so far associated to any context (i.e., noninduced part). The key point in the generation of contradictionfree MCMs is that the former part has to be induced by some context which, itself, is already present in the domain. That is, all the induced parts have to be already contained within some context. Otherwise, to include the induced parts—each constrained by the context it is already in—in a new context, the newly created context would have to satisfy some nontrivial constraints and therefore could not be freely assigned.
Let us discuss one final case to further clarify the process. Consider the multicontext model in Fig. 4. Could this model be constructed freely and gradually in a probabilistically consistent manner? The answer is positive. We first assign , then we assign where is induced by our first assignment of . Finally, we assign where is induced by our first assignment of . A closer look reveals that this is not the only way we can gradually construct a contradictionfree model in this case: we could have performed the assignments in a different order^{5}^{5}5Yet, this is not always the case: suppose there are four RVs in the domain, namely, and d and we would like to assign and . Performing the assignments in the order would not introduce any inconsistencies, in contrast to using the order .. Of course, the only thing which would have been different would be the induced probabilities. That is, if one does the assignment in the following order: (1), (2), (3) then the first assignment of will induce for the second assignment of and the second assignment will induce for the third assignment .
4 Inference in Mcms
In this section we consider evidential inference problems in multicontext settings. The objective is to evaluate (to the extent possible) a probability of the form , called a query, where O and E are two mutually exclusive sets of RVs. The set E is the set of evidence variables and O is the set of RVs for which we are interested in knowing with what probability they take on the value , upon the observation of . In multicontext settings, inference problems can be categorized into two broad classes:

IntraContextual Inference Problems: For which the sets E and O both belong to the same context.

InterContextual Inference Problems: For which the sets E and O do not belong to a single context and, therefore, more than one context is involved in the inference problem.
In what follows, we will elaborate on these two cases.
4.1 IntraContextual Inference Problem
One advantage of MCMs is that, once an inference problem is found to be an intracontextual inference problem, one can take advantage of the rich independence structure potentially governing the context to accomplish the task of inference in a computationally efficient way. For instance, if the probabilistic knowledge of a context is presented in a form of a BN, then one can benefit from a variety of exact or approximate methods already developed for BNs. For a comprehensive study of such methods the reader is referred to [Koller and Friedman2009]. Hence, it is of great interest to have contexts whose probabilistic knowledge can be represented in some form of a PGM with sufficiently rich independence structure for which inference problems can be solved in a computationally efficient way. For example, if the probabilistic knowledge of a context is to be modeled according to some BN, we would like that BN to be as sparsely connected as possible and enjoy low treewidth to ensure computational efficiency for the task of inference [Chandrasekaran, Srebro, and Harsha2008].
4.2 InterContextual Inference Problem: Inference Grammar
In this section, we turn our attention to the task of intercontextual inference. The RVs involved in the query for the intercontextual inference problem do not belong to a single context. For this reason, the answer to the query is inevitably in the form of an interval indicating a lower and upper bound for the query. Since we have . Therefore, we can focus our attention on the minimization problem (i.e., identifying a lower bound to the probability of interest) realizing that any maximization problem (i.e., identifying an upper bound to the probability of nterest) could be cast as a minimization problem and vice versa.
First, we are going to consider some simple queries which are posed to some example MCMs. These MCMs are depicted in Fig. 5(ac). The goal here is to develop some insight as to which variables are indeed relevant and which are deemed irrelevant for a given query and the corresponding MCM.
We begin by considering a simple case: the disjoint MCM shown in Fig. 5(a). The rule to evaluate is also given in Fig. 5(a). Interestingly enough, the expression only requires the intracontextual quantities and and it does not depend on any other RV present in the domain. In other words, as far as is concerned, the MCM shown in Fig. 5(a) is equivalent to a much simpler MCM: the one corresponding to having only two disjoint contexts described by and . Next, we take the MCM given in Fig. 5(b) where there is an overlap between the context containing X and the one containing Y. The overlapping part consists of the random vector Z. The rule to evaluate is given in Fig. 5(b). Now, consider the MCM shown in Fig. 5(c) where we have the same setting we had in previous case but a new random variable t is added in the overlapping region. Notice that the expression for given in Fig. 5(c) is the same expression given for in Fig. 5(b) with the substitution of instead of . That is, in Fig. 5(b) and in Fig. 5(c) are representing the same thing, namely, “all the variables in the overlapping region”, and in that respect, they are ultimately the same. The rules are very much like sentences in predicate logic for which variables merely serve as placeholders.
The derivation of the rules given in Fig. 5(ac) is not presented here. However, using the proof presented in Sec. AII of Appendix (to identify the relevant variables) and subsequently following the methodology outlined in Sec. AIII of Appendix (to visualize the partitions and reason out the extent they overlap) it should be straightforward to derive the presented rules.
The sample set of rules presented is by no means exhaustive, nonetheless, due to the idea of context transformation that will be discussed in Sec. 4.3, they can be applied to a wide range of interesting intercontextual inference problems. We would like to clarify that our ultimate objective is not to compute and provide the complete set of rules that can answer all possible queries and for all possible MCMs, since simply, the set is infinite in size. What we need, therefore, is an algorithm, let us call it , that can provide the answer to the posed query being given an MCM as an input. The presented rules provide insights and hints to the nature of which needs to be devised to ideally handle any arbitrary query posed to any^{6}^{6}6Although we believe that the MCMs generated through the generative process outlined in Sec. 3.2 are more cognitively plausible, nonetheless, from a pure mathematical point of view, it would be of interest to find an algorithm which could handle any MCM. MCM. In a sense, we can get a glimpse of the nature of through analyzing the presented rules. In other words, the derived rules serve as a lens through which one can study . In Sec. AI of Appendix a simple version of that can handle arbitrary MCMs is outlined.
The motivation behind giving this sample set of rules can now be summarized in the following.

To shed light on the general nature of a rule (which reflects on the nature of ). More specifically, to illustrate that a rule enjoys two key properties, namely: (i) scaleinvariance, (ii) resemblance to sentences in predicate logic, in that in both cases, variables are mere placeholders. For this resemblance we refer to as inference grammar.

To demonstrate that a rule is telling us which intracontextual quantities are essential and which are irrelevant for a particular intercontextual query.

To emphasize the key property that a rule derived under a specific MCM remains valid for and can be applied to infinitely many other MCMs all of which are linked through the notions of nestedness and transformation; hence generalization is achieved.

To lay down the foundation of transformation and nestedness which both play crucial roles in understanding the underlying machinery behind .
Next, we discuss another key property of the inference rules, namely, that of scaleinvariance. Consider once again the case in Fig. 2. Now let us derive and where . Using the rule given in Fig. 5(a), one arrives at the following results: and . In other words, the expressions remain the same, regardless of the dimension of the quantity of interest, i.e., be it a single RV or be it a random vector comprised of many RVs. In this respect, once again, the inference rules resemble expressions in predicate logic. The intuition on the scale invariance is provided in Sec. AIII of Appendix.
It is worth noting that formulates the intercontextual inference problem as a Linear Programming (LP) optimization (cf. Sec. AI of Appendix). The key issues to consider are: (i) what RVs have to be included in the LP, and (ii) the abstraction level should choose to encode the RVs identified in step (i) for the LP, i.e., the parametrization of RVs identified in step (i) for the LP. In what follows, the concepts of nestedness and transformation are put forth. Once the two are introduced, one could apply a single rule (e.g., one in Fig. 5(a)) to a much larger number of MCMs; in fact to infinitely many MCMs.
4.3 InterContextual Inference Problem: Nestedness and Transformation
The nested property, or nestedness, refers to the fact that every MCM can be considered as an element of a family of MCMs. That family contains all MCMs which through marginalization can produce the original MCM. In such a case we simply say that the nested property holds between the original MCM and the family. The process of going from the original MCM to one of the members of the family is referred to as transformation. For example, the MCM containing three contexts , , and shown in Fig. 6(a) is a member of a family of MCMs containing two contexts and , shown in Fig. 6(b), one of which is associated to a family of JPDs over x and y (the dashdotted circle in Fig. 6(b)) which, if marginalized, produces the same and in the original MCM (leftmost MCM). Mathematically, the set of all JPDs over RVs x and y which, if marginalized, produce specific marginal probability distributions and is denoted by . The notion of the nested property enables us to look at one MCM as a subset of another larger MCM. The nested property, furthermore, enables one to sort MCMs in a hierarchical construct as illustrated in Fig. 6 where moving from the left to the right corresponds to moving from lower levels of hierarchy to higher levels.
To convey the idea, consider the case illustrated in Fig. 7. Suppose the query of interest is . Then, one can first transform the original (leftmost) MCM into the MCM shown in the middle, and subsequently into the rightmost MCM. Hence, using the rightmost MCM and the rule given in Fig. 5(b), one can write . If we had the knowledge of then the expression given above would have been sufficient to derive . However, since is not known, we need to go through one more step. This is precisely due to, and emphasizes, the fact that by working on the rightmost MCM we implicitly presumed that we were equipped with more knowledge than we really had. Using the middle MCM and the rule given in Fig. 5(a), one can conclude . Altogether^{7}^{7}7This is due to the observation that for function when , ., . It is worth noting that the same rule would apply if instead of the random vector R we were dealing with the random variable a, i.e., to find one could use the same expression given for by substituting in place of in all the expressions. Arguments of this kind are made possible due to the idea of transformation which enables us to analyze the transformed MCM (e.g., the middle one in Fig. 7) rather than the original MCM (the leftmost one in Fig. 7). Furthermore, the concept of transformation highlights a key idea: if a piece of information (i.e., an intracontextual quantity) is irrelevant in the transformed MCM for the posed query, it must have been irrelevant in the original MCM in the first place. This statement, once again, sheds light on what intracontextual quantities are relevant or irrelevant to derive a posed intercontextual query on a given MCM.
5 Discussion
We will now discuss related work so as to build a connection between ours and previous attempts to incorporate partial probabilistic knowledge of a domain in the task of inference.
Attempting to combine Probabilistic Logic and BNs, the authors in [Andersen and Hooker1990, Andersen and Hooker1994] formulate the inference problem as an optimization problem subject to nonlinear constraints so as to incorporate the conditional independence relations embedded in the BN. However, in our proposed framework, the issue of dealing with conditional independence relations does not arise at all, because these relations are dealt with during the derivation process of intracontextual probabilities.
The authors of [Hansen et al.1995] point out that one could avoid nonlinear optimization when the value for a conditional probability is at least imprecisely known. For example, the constraint , if the value for is known either precisely or imprecisely within some interval , can be written as
Hence, the independence can be formulated as a number of linear constraints. However, the main drawback of this approach is that encoding a conditional independence relation such as requires a number of linear equations that is exponential in to be introduced into the optimization problem [Andersen and Hooker1994].
Drawing on the idea of ContextSpecific Independence (CSI) [Boutilier et al.1996], the authors of [Geiger and Heckerman1991] propose the Bayesian Multinet model which aims at taking advantage of the existing CSIs to perform inference, by modeling a single BN as multiple contextspecific BNs. Translated into our multicontext setting, the Bayesian Multinet model corresponds to the case where the whole domain is modeled as a single BN, i.e., a singlecontext MCM, that can be decomposed into multiple BNs each being valid for a specific instantiation of some RVs in the domain.
The authors of [Thone, Guntzer, and Kiebling1992] point out the same concerns which led us to propose MCM, namely: (i) If unverified (in)dependencies are imposed between the variables in the domain then implausible results may arise; (ii) PGMs require one to have complete probabilistic knowledge of a domain which may not be available. Motivated by these, [Thone, Guntzer, and Kiebling1992] gives a collection of rules to carry out inference in a domain. Broadly speaking, this work is similar to ours in spirit with the main distinction being the level of abstraction chosen to perform inference. In [Thone, Guntzer, and Kiebling1992] inference is performed in a very local and rulebased fashion and conditional independence relations are dealt with directly which complicates the task at hand; a task which is futile when it comes to dealing with domains of many variables. In our case, by introducing the notion of context and encoding conditional independence relations within contexts we avoid having to contemplate the intracontextual inference problem and leave this task for the corresponding context. This way, we can take advantage of the possibly rich independence structure governing the context and carry out the intracontextual inference problem in a computationally efficient manner.
Finally, let us discuss some interesting aspects of the proposed model.
The degree of belief is encoded mathematically in the form of a probability distribution over the variables contained within the context. Furthermore, in the process of partial belief formation (which leads to the formation of contexts) the reasoner is ignorant as to how various contexts probabilistically interact (are related), except that, some contexts may in fact share a number of variables in between and hence overlap. Later on, in the process of the derivation of the query posed to the reasoner, this ignorance manifests in the uncertainty region represented by the min/max values for the intercontextual query of interest. In other words, if the reasoner incurs ignorance as to the (in)dependency structure governing the variables present in the domain, then later on, in the process of derivation of the posed query, the reasoner has to pay the price by merely arriving at a probability interval rather than a point probability as an answer to the query of interest. Yet, the knowledge of the underlying dependency structure is a fundamental knowledge whose availability to the reasoner should not be postulated as an inevitability but as an advantaged position.
The evolutionary process of MCM does not enforce a specific gradual expansion path, for the claim of MCM is merely that any partial belief formation as to the domain can be modeled in the framework depicted by MCM. That is, the reasoner may arrive at different MCMs, depending on the order in which the reasoner encounters different concepts and also depending on her background knowledge as to the nature of the potential connections between a collection of variables. Simply put, the order according to which the reasoner comes about knowing the concepts or propositions of the domain does matter (cf. the discussion on the order of belief formation in Sec. 3.2).
MCM enables one to carry out inference without having to commit to any unjustified or uncertain independence assumptions. In light of this, contexts symbolize the regions of the domain over which an (in)dependence structure is presumed and hence, the growth and merging of contexts indicates the formation of new (in)dependence structures over some parts of the domain which previously were unstructured. In short, MCM is meant to be invoked in circumstances where the observations and the a priori knowledge combined are not sufficient for the reasoner to form the full JPD over all of the domain variables and yet, quite crucially, the reasoner is reluctant to submit to any unjustified assumptions to compensate for such inadequacy of knowledge.
6 Conclusion
In an attempt to establish a middle ground between Bayesian Logic and Probabilistic Logic [Andersen and Hooker1990, Andersen and Hooker1994], on one side, and PGMs^{8}^{8}8For instance, Bayesian Networks [Pearl1986], Markov Networks [Koller and Friedman2009], and Chain Graphs [Buntine1995]. on the other, we proposed the MultiContext Model to represent the state of partial knowledge regarding a domain. The generative process for the gradual construction of contradictionfree MCMs was discussed. The task of Inference for MCM was studied and, along the path, the notions of inference grammar, nestedness, and transformation were introduced. A short version of without the scaleinvariance property was provided in Appendix. It is worth noting that scaleinvariance property can be achieved with a minor change to the last step of the proposed algorithm.
Appendix
AI : A short version of without scaleinvariance property
aims at minimally parameterizing the information contained in an MCM so that the posed intercontextual query can be stated as an LP with the fewest number of parameters. As pointed out earlier in Sec. 4.2, has to decide on the following: (i) what RVs have to be included in the LP, and (ii) the abstraction level required to minimally encode the information on the RVs identified in step (i) for the LP, in our case, the parametrization of the identified RVs.
In what follows, a simple algorithm, , is sketched which only performs (i) and ignores (ii). In other words, identifies the relevant RVs needed to derive the exact lower/upper bound for the intercontextual query, however, it does not aim at minimally encoding them into the LP^{9}^{9}9To read more on this, the reader is referred to the discussion on scaleinvariance property in Sec. 4.2 and Sec. AIII of Appendix.. consists of three steps:

Identify all the RVs involved in the posed query (e.g., in these are the random vector , random vector and RV ).

If any two of the already identified RVs belong to two overlapping contexts, identify all the overlapping RVs between these two contexts (e.g., in Fig. 5(b) and for the query for which step (1) would identify and , random vector in the overlapping region must be identified as well).

If any two of the already identified RVs belong to two contexts connected through a chain of overlapping contexts: identify all the RVs contained in all the overlapping regions of the chain of contexts.

Parameterize only the identified RVs in steps (1), (2a), and (2b) (remove all the other RVs from the MCM—there is no need to encode the information on any other RVs not identified in steps (1), (2a), and (2b)).
It should be noted that whether the posed query involves minimization or maximization does not affect which RVs need to be identified by . Finally, It is worth noting that with a minor modification to step (3) of , the scaleinvariance property could be achieved. The modification has to do with the question of how to minimally encode the information on each RV identified in steps (1), (2a), and (2b) of .
To demonstrate the operation of on a more complicated MCM that involves loops, consider the following example sketched in Fig. 8(a). The query of interest is .
Next, we are going to sketch the proof for . Let us first state the claim formally and then provide the proof.
AII Proof for :
Lemma: Given a posed query and an MCM, if all the information on the RVs identified in steps (1) to (2b) of is stated and then solved as an LP, the exact solution (i.e., a min or max) can be derived for the posed query; all the remaining information available in the MCM is deemed irrelevant to the derivation of the query, hence the sufficiency.
Proof: Our proof is constructive. In the proof we entertain two ideas, namely (i) the idea of generative process and, particularly, that of conditioning also used in Sec. 3.2, and (ii) the notion we refer to as the locality of information. Suppose that all the RVs discussed in steps (1) to (2b) of are identified. The key insight is that the information on how the remaining RVs probabilistically interact with each other is completely local in nature and, therefore, irrelevant to the derivation of the posed query. To see this, one can start off with the identified RVs and then in a gradual fashion add on^{10}^{10}10
This is based on the fundamental property that a JPD can be expanded using the chain rule of probability in an arbitrary order.
the rest of the RVs (through the idea of conditioning discussed in Sec. 3.2). Quite crucially, this very process of adding the nonidentified RVs to the model can be done completely in a local fashion, i.e., without imposing any constraints on how the identified RVs probabilistically interact. The mere fact that those RVs can be added into the model: (i) subsequent to the identified ones, and (ii) without inducing any sort of constraints on the identified ones, deems them irrelevant to the derivation of the query.AIII ScaleInvariance Property: Intuition
Here, we will provide a proof for the example on scaleinvariance property given in Sec. 4.2. Although the proof is provided for a special query, the methodology used in the proof provides an insightful way of visualizing an inference problem. The idea behind the proof is very simple and related to visualizing the connection of a RV to the underlying sample space using Venn diagrams. Without loss of generality, we assume that all the RVs present in the domain are binary^{11}^{11}11The generalization of the argument to nonbinary RVs is straightforward.. Random vector partitions the sample space into disjoint regions each of which corresponds to a realization of X. If each realization of the random vector corresponds to a binary number (i.e., binarycoding the realizations), then one can conclude . Let us index the partitions by their corresponding realization of X. An illustrative example of an induced partitioning of the sample space due to random vector is depicted in Fig. 9(a), and a partitioning induced by RVs y and z is sketched in Fig. 9(b). We note that the mere knowledge of the distribution function of a random quantity does not provide one with the knowledge of the underlying partitions. For this particular example, since the JPD over is not available, the knowledge of how the partitions induced by (Fig. 9(b)) and the ones induced by X (Fig. 9(a)) interact, i.e., to what extent they overlap, remains unspecified. Therefore, since , to minimize (maximize) , the quantity has to be minimized (maximized). Pictorially, the minimization (maximization) of corresponds to the minimization (maximization) of the overlap between the partitions corresponding to the events and ; hence, very simply, and . The key point, which yields the scaleinvariance property, is that to derive the minimum (maximum) overlap between the partitions corresponding to the events and the information as to how the other partitions—corresponding to the other realizations of the present RVs in the model—interact with one another neither needs to be known nor to be encoded into the LP; a fact which results in not requiring to encode the information as to the other realizations. Hence the only pieces of information that are required to be encoded and then solved as an LP are and . The same line of reasoning could be adopted for . The idea of scaleinvariance, therefore, aims to avoid the encoding of the information as to the partitions induced on which are yet deemed to be irrelevant to the derivation of the posed query; hence one needs to encode solely the relevant ones into the LP.
Acknowledgement
The authors would like to thank the anonymous reviewers for their valuable comments.
This work was supported in part by the Natural Sciences and Engineering Research Council (NSERC) under grant RGPIN 262017 and by the Fonds Quebecois de la Recherche sur la Nature et les Technologies (FQRNT).
References
 [Andersen and Hooker1990] Andersen, K. A., and Hooker, J. N. 1990. Probabilistic logic for belief nets. In International Congress of Cybernetics and Systems. New York City.
 [Andersen and Hooker1994] Andersen, K. A., and Hooker, J. N. 1994. Bayesian logic. Decision Support Systems 11(2):191–210.
 [Boutilier et al.1996] Boutilier, C.; Friedman, N.; Goldszmidt, M.; and Koller, D. 1996. Contextspecific independence in bayesian networks. 115–123. Morgan Kaufmann Publishers Inc.

[Buntine1995]
Buntine, W. L.
1995.
Chain graphs for learning.
In
Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
, 46–54. Morgan Kaufmann Publishers Inc.  [Chalmers1976] Chalmers, A. F. 1976. What is this thing called science? Hackett Publishing.
 [Chandrasekaran, Srebro, and Harsha2008] Chandrasekaran, V.; Srebro, N.; and Harsha, P. 2008. Complexity of inference in graphical models. Proceedings of the 24th Conference in Uncertainty in Artiﬁcial Intelligence 70––78.
 [Geiger and Heckerman1991] Geiger, D., and Heckerman, D. 1991. Advances in probabilistic reasoning. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, 118–126. Morgan Kaufmann Publishers Inc.
 [Hansen et al.1995] Hansen, P.; Jaumard, B.; Nguetse, G. D.; and Aragao, M. P. D. 1995. Models and algorithms for probabilistic and Bayesian logic. Citeseer.
 [Koller and Friedman2009] Koller, D., and Friedman, N. 2009. Probabilistic graphical models: principles and techniques. MIT press.
 [Pearl1985] Pearl, J. 1985. Bayesian networks: a model of selfactivated memory for evidential reasoning. in Proceedings of the Cognitive Science Society 329––334.
 [Pearl1986] Pearl, J. 1986. Fusion, propagation, and structuring in belief networks. Artificial intelligence 29(3):241–288.
 [Pearl1990] Pearl, J. 1990. Reasoning with belief functions: An analysis of compatibility. International Journal of Approximate Reasoning 4(5):363–389.
 [Thone, Guntzer, and Kiebling1992] Thone, H.; Guntzer, U.; and Kiebling, W. 1992. Towards precision of probabilistic bounds propagation. In Uncertainty in Artificial Intelligence, 315–322.
Comments
There are no comments yet.