Probability theory, developed over the last three centuries, has provided an overarching framework for modeling uncertainty in the real-world. As a result, it has become a key mathematical tool used in designing state estimation and inference algorithms. While Pascal, Fermat, and Huygens were some of the first contributors to the development of probability theory, Pierre-Simon Laplace and Thomas Bayes were among the first to formulate the notion of conditional probability, and use it to estimate an unknown parameter from the observed data [1, 2]. Since its mathematical formulation, and more so after the axiomatic foundations laid by Kolmogorov , the theory of probability has formed the basis for inference and state estimation algorithms.
In Bayesian inference, for example, the goal is to successively improve an estimate of a model parameter or an evolving state variable, such as the pose of a robot, by incorporating the observed information [5, 6]
. A prior probability distribution is assigned to the initial state variable or the model parameter, and this distribution is successively improved by computing the posteriori distribution, using the observed data. Bayes’ law and the law of total probability form the theoretical basis for this computation.
One of the main difficulties in such state estimation and inference procedures is its computational tractability. Computing the posteriori distribution and the maximum a posteriori (MAP) estimate is hard in most problems of practical significance. Several approximation algorithms have been considered to overcome this limitation, and is an active field of research.
Probability theory, in its simplest form, characterizes an uncertain quantity by a distribution function, which assigns a weight over the set of all possible outcomes; more generally it defines a probability measure over the set of all possible outcomes. This distribution function, or the weight function, is often too much information to carry for computation, and as a result leads to computationally intractable solutions. The difficulty in computing the posteriori distribution is a manifestation of such intractability.
Another major issue with using distribution functions is that they are chosen mostly to ensure easier analysis and algorithm design. In robotic perception, for example, additive Gaussian noise is often assumed in the motion and sensing/observation model 
. Although, this produces a simple Kalman filter like solution to the robotic perception problem, it can cause severe degradation in performance due to the non-linearities inherent in motion and sensing. In several such applications, and in robotic perception in particular, a bounded noise model is more appropriate.
In this work, we provide an alternate framework of uncertainty variables to model uncertainty. A simple uncertainty variable is characterized by an uncertainty set . A realization of an uncertainty variable can lie only in its uncertainty set. Conditional uncertainty is characterized by a conditional uncertainty map , that maps every realization of to a subset of , which is the set of all realizations of , i.e. given a realization , a realization of can only lie in the set . Thus, the larger the set , the larger is the uncertainty in given .
Using this notion of uncertainty variables and conditional uncertainty maps, we first prove Bayes’ law and the law of total probability equivalents for uncertainty variables. We then define the notions of independence, conditional independence, and pairwise independence for a collection of uncertainty variables. We argue that this new notion of independence preserves the properties of independence defined over random variables. For example, we show that the independence between a collection of uncertainty variables does not imply pairwise independence.
Graphical models over random variables have been very useful in designing exact and approximate inference algorithms [7, 8]. We extend the theory of uncertainty variables, developed in the first part of the paper, to define a graphical model over uncertainty variables. We define Bayesian uncertainty network, as a directed graphical model over a collection of uncertainty variables. As the name suggests, this is equivalent to the Bayesian network defined over random variables. We show that all the conditional independence properties, expected out of a Bayesian network, also hold for the Bayesian uncertainty network.
In many state estimation and inference problems, one is interested in a point estimate. We, therefore, define the notion of a point estimate for a Bayesian uncertainty network. We prove that the point estimate equals the MAP estimate for a Bayesian network, under an appropriate representational relation between the Bayesian network and the Bayesian uncertainty network. This illustrates the generality of this new approach of characterizing uncertainty.
I-B A Brief Literature Survey
Using bounded sets instead of probability distributions is not a new idea, and has been explored in the control systems literature [9, 10, 11]. Some of these early works on bounded noise models in control theory, also inspired the formulation of set-estimation in the signal processing literature [12, 13, 9]. The motivation here was that the point estimate, such as MAP or ML, is not good enough, and a confidence region, namely a set, would be useful. A set estimate, for say a model parameter, was defined as an intersection of sets, each of which, corresponded to an observation. To help compute such an intersection, especially of ellipsoidal sets, several approximating methods were proposed [14, 15].
This line of work differs from the theory of uncertainty variables developed here in a principle way as we are not interested in set estimates. We use uncertainty sets only as a simpler representation, that yield a computational benefit, instead of using the distribution function. In the last section, for example, we develop a point estimate and show its relation with the MAP estimate. Also, the notion of conditional uncertainty map, independence, and graphical models do not exist in the set-estimation literature.
. Robust optimization also begins with the same premise as ours, that the way probability theory characterizes uncertainty results in computational intractability. As a recourse, when many uncertain quantities are involved, robust optimization constructs uncertainty sets over these uncertain quantities, using the law of large numbers and the central limit theorems. Our work, on the other hand, uses uncertainty sets to develop an alternative framework for modeling uncertainty in a single variable. Again, the notion of conditional uncertainty maps, independence, and graphical models is irrelevant, and therefore, does not exist in the robust optimization literature.
I-C Organization and Notations
In Section II, we develop the notion of uncertainty variables, and establish Bayes’ law and the law of total probability. In Section III, we define independence between a collection of uncertainty variables. Bayesian uncertainty network and point estimates are discussed in Sections IV and V, respectively. We conclude in Section VI.
We use the following notation. For an indexed set , or denotes the collection and . We use to denote the set of integers . Uncertainty variables are usually denoted by , , and , while random variables are denoted by , , and .
Ii Theory of Uncertainty Variables
Suppose, we want to characterize an uncertain quantity such as the temperature in a room, or the position of a robot, or an atmospheric condition measured by several variables. All such uncertain quantities have an implicitly defined underlying domain. For example, a temperature measurement can take any real values, a pose of a robot is a point in a dimensional space of poses. Moreover, any such uncertain quantity is more likely to lie in certain region of this domain, and not spread out everywhere.
Motivated by this, we define the notion of an uncertainty variable and a simple uncertainty variable. It is the simple uncertainty variable that will be of interest to us.
An uncertainty variable (UV) is defined as a tuple , where is the domain of the UV and is uncertainty map, that maps every point in to a subset of :
We say that an UV is simple if for all , either or .
A simple UV can always be represented by an uncertainty set:
Simple or not, any realization of the UV lies in the uncertainty set . For ease of notation, we will use to denote a simple UV. We provide the definitions and proofs for UVs, in most generality, in Appendix -A. Here, we summarize some of the results for the specific case of simple uncertainty variables.
Consider the following examples of simple uncertainty variables:
1) Elliptic UV: An Elliptic UV is defined as
where is a positive definite matrix and
is a vector in. This UV can be used to model noisy measurement of a location .
2) Polytopic UV: A polytopic UV is defined as
where is a matrix, and and are vectors in .
3) Canonical UV: For every random variable , taking values in
with a probability density function, we can construct a simple canonical UV . We call it the canonical UV, canonical to the random variable . The simple canonical UV is given by
) is a Canonical UV for a uniformly distributed random variable over the polytope.
A joint simple uncertainty variable can similarly be defined with a domain , an uncertainty map , and an uncertainty set , where
We now state an equivalent of the law of total probability.
The marginal uncertainty sets and are given by the projections of :
It is not necessarily true that the joint uncertainty set is a cross product of the marginal sets and . In Figure 1, we have plotted the joint uncertainty set of two variables. Here, , but . We, therefore, define a conditional uncertainty map.
Let and be two UVs. The conditional uncertainty map of given is a set function given by
for all , where denotes the set .
The conditional uncertainty maps each to a set in the collection . Unlike the marginal maps and , the conditional uncertainty map can map an to any subset in . The larger the set , the greater is the uncertainty in UV , given .
Ii-a Bayes’ Rule
We now prove an equivalent of Bayes’ rule. We first define a convenient operation between a set and a set function.
For a set and a set function we define a cross product to be a set in given by
The Bayes rule for the uncertainty variables is as follows.
For the joint UV , the uncertainty set is given by
where for all and .
In the next section, we argue that the uncertainty sets and conditional uncertainty maps, can be represented as sub-level sets of some functions. This representation will be useful in proving some of the results later in the paper.
We have represented uncertainty variables and conditional uncertainties as either sets or set maps. It is, at times, useful to deal with functions rather than sets. In this small section, we present a result, that states that every such uncertainty set or a conditional uncertainty map can be represented as a sub-level set of a function.
The following statements are true:
1) An uncertainty set can be written as
for some function , some positive integer , and .
2) A conditional uncertainty map can be written as
for some function , some positive integer , and .
The proof is trivial, as such functions, namely and , can always be obtained by a simple construction. For the first part, given a set , take , for all . Here, is the indicator function for the set . Take and . Then, . Similarly, for the second part, take , , and .
Note that we have not imposed any conditions on the functions and in Lemma 1, except that they take values in some Euclidean space .
Ii-C Computing the Posteriori Map
The main advantage of this formulation is that it can be easier to compute the posteriori uncertainty map. For example, in many machine learning applications, we are given a model for the data, say, and a model for the prior parameters, say . This is equivalent to knowing the conditional uncertainty map and the uncertainty set . With this, the joint uncertainty set can be computed as
Then, the posteriori uncertainty map can be computed by a simple projection on :
This posteriori map, for a given observed data , will produce a set in that tells us about the uncertainty in .
Let us use the sub-level set representations of Lemma 1. Let and . Then the posteriori uncertainty map , for a given observed data , is given by
From (11), we see that this is nothing but the projection (on ) of intersection of two sub-level sets.
The idea of obtaining set-estimates, as intersection of sets, existed in the set-estimation literature [12, 13, 9]. However, the literature mostly limited itself to linear models, in which, the observed data and the underlying state variable were related by a linear equation. The notion of uncertainty variables generalizes this idea to any such and , and their relation may not be linear.
To see that there is indeed a difference between the posteriori distribution and the posteriori uncertainty map, in the sense that one cannot be trivially constructed from the other, consider the following example: Let and be the canonical uncertainty sets and conditional uncertainty maps, for a given marginal and conditional density functions and , respectively. Then, the joint uncertainty set is given by
and the posteriori uncertainty map is given by
Note that this set is not same as , for some constant .
In the next section, we define the notion of independence and conditional independence for a given set of uncertainty variables. We show that all the independence properties that are true for random variables, such a total independence not implying pairwise independence and more, are retained for the uncertainty variables.
We first define independence between two simple uncertainty variables.
We say that the two UVs, and , are independent if for all .
It is trivial to see that for independent uncertainty variables and , the joint uncertainty set also factors into the product of the marginal uncertainty set. We articulate this in the following lemma.
Uncertainty variables and are independent if and only if , where , , and are uncertainty sets for , , and , respectively.
We first prove the following lemma about the operation given in Definition 3.
Let and . If the mapping is such that , for all , then .
Using the definition of we have
where the last equality following because of the assumption , for all . Now, we can take the union inside the cross product in (17) to get
which is nothing but .
where , for an . It now suffices to show that . Using Theorem 1, we get to be
We now show that if the joint uncertainty set factorizes, i.e. , then and are independent. We, therefore, have to show that for all . We use the definition of the conditional uncertainty map, given in Definition 2. For any , the conditional uncertainty map is given by
This implies that for all .
Conditional independence can be similarly defined. We do so in terms of factorization of the uncertainty maps.
We say that the UVs and are independent, given a UV , if
for all .
We will use the notation to denote that and are independent, and to denote that and are conditionally independent, given .
When it comes to several uncertainty variables, defining independence is as tricky as it is for the random variables. However, it turns out that the independence and conditional independence properties that hold for random variables also hold for uncertainty variables. In Section IV, we will introduce Bayesian network models on a collection of uncertainty variables. We will see that the set of uncertainty variables preserve the conditional independence properties, which hold for the Bayesian network defined over random variables .
To provide a prelude, we define pairwise and total independence between a collection of uncertainty variables. In traditional probability, pairwise independence does not imply total independence between a collection of random variables. The same is true for the uncertainty variables. Let us first define pairwise and total independence for the uncertainty variables.
A collection of uncertainty variables is said to be
1) pairwise independent if for each , , we have
where , , and are uncertainty sets for , , and , respectively.
2) totally independent if
where and are the uncertainty sets of and , respectively.
In the following lemma, we prove that pairwise independence does not implies total independence.
If are totally independent then they are also pairwise independent, but the converse is not true.
(a) Let be totally independent uncertainty variables. Then we have . Take such that . We know that the uncertainty set of is given by a simple projection of on . Therefore,
We assumed and to be any , . Thus, is also pairwise independent.
(b) We prove that the converse is not true by constructing a counter-example. Take three uncertainty variables such that and , for all and . However, the joint uncertainty set . Such a joint uncertainty set is given by
which is shown in Figure 2.
In the next section, we define the Bayesian uncertainty network, in which we extend the concept of Bayesian network for random variables to a collection of uncertainty variables. We will see that the independence properties are preserved in this extension from random variables to uncertainty variables. In Section V, we will define a point estimate and show that in special cases, it reduces to the MAP estimate.
Iv Bayesian Uncertainty Networks
We now extend the notion of Bayeian network, defined for a collection of random variables, to a collection of uncertainty variables. We call it the Bayeian uncertainty network.
Let be a directed acyclic graph (DAG). For each node , let denote the set of parents of node , i.e. for each there exists a link . A node is said to be descendant of if there exists a directed path from node to node in G. We use denote the set of nodes that are non-descendants of . Also, we will use to denote the set of all nodes that have no parents, i.e. Typically, we would need to order the nodes in in a sequence. A canonical ordering of nodes in is an ordering such that parents are indexed before their children, i.e., for all , we have . We know that such an ordering of nodes in a DAG is always possible.
A collection of uncertainty variables is characterized by its joint uncertainty set. We now formally define the notion of Bayesian uncertainty network, in which the uncertainty set of a collection of uncertainty variables factorizes according to an underlying DAG.
A Bayesian uncertainty network is the tuple of uncertainty variables and a DAG , such that factorizes according to , namely, every node is associated with a unique uncertainty variable , and there exists conditional uncertainty maps
for each , such that, for any canonical ordering of nodes in , the joint uncertainty set of is given by
where is a simple cross product of , over , namely
Note that the factorization in (31) is well defined, provided we ignore the ordering of variables in the tuple. To see this, let us make use of Lemma 1 in Section II-B. Let for each , , and for all let
for some functions and vectors . Then, the factorization in (31) implies that the joint uncertainty set equals
This set remains the same for any canonical ordering of the set , except for the ordering of variables in the tuple . With a slight abuse of notation, we denote the cross product in (31) as
A Bayesian network, defined over random variables, satisfies many conditional independence properties. In the next section, we show that these independence properties are retained for the Baysian uncertainty network.
Iv-a Conditional Independence Properties
We first define the local Markov property. These are a set of conditional independence properties that are satisfied by the Bayesian network, defined over a collection of random variables. We will show that these independence properties are also valid for the Bayesian uncertainty network. We now define the local Markov properties.
We say that the uncertainty variables satisfy local Markov property according to a directed acyclic graph if
(1) Each node is associated with a unique UV .
(2) For every , we have .
We now briefly recall the notion of d-separation in Bayesian networks. We first need to recall a few definitions. We define a path on a DAG to be a sequence of nodes such that either or is a valid directed edge in , for all . A node on a path is said to be serial if there exists such that and . Pictorially, node on path looks like . Similarly, a node on path is said to be diverging if there exists such that and . Pictorially, node on path looks like . And finally, a node on path is said to be converging if there exists such that and . Pictorially, node on path looks like .
Let , , and , be three disjoint collection of nodes in the DAG . A path from to is a path that starts from some node in and ends at a node in . We say that a path from to is blocked by if one of the following conditions are satisfied:
the path contains a node , and on is either serial or diverging
the path contains a node , on is converging, and that and its descendants are not in
We say that and are d-separated by if all paths from to are blocked by . In the case of a Bayesian network, defined over a collection of random variables , it is known that if nodes and are d-separated by nodes of , then the random variables and are independent given . We show that this relation of conditional independence also holds for the Bayesian uncertainty networks.
We now show that the Bayesian uncertainty network satisfies both, the local Markov property, and all the conditional independence statements asserted by d-separation. The proof is omitted due to space constraints.
Let be a DAG. The following three statements are equivalent:
is a Bayesian uncertainty network.
satisfies all the local Markov properties according to .
For all subsets , , and in , whenever and are d-separated by in , we have .
This theorem implies that the conditional independence properties of the Bayesian network also hold for the Bayesian uncertainty network. In the next section, we define a point estimate for Bayesian networks. We will show that for the canonical representation of the uncertainty sets and the conditional uncertainty maps, the point estimate equals to the MAP estimates.
V Point Estimates
In practice, we are generally interested in point estimates. For example, in the robotic estimation problem, we would like to learn the true trajectory of a robot or a map of its surrounding. In the regression problem or classification problem, we would like to estimate the model parameters.
In this section, we define point estimate for a Bayesian uncertainty network. In the Bayesian uncertainty network, we have some uncertainty variables that we observe, and some others which we want to estimate, given the observed variables.
Let be a Bayesian uncertainty network, where is a DAG. Let the joint uncertainty set for be given by (34). Let denote the set of nodes, which correspond to the observed data. Namely, we have for all , and that we know . Let be the set of nodes, which correspond to the uncertainty variables that are of interest to us, and we would like to estimate. We assume and to be disjoint, and that .
From the joint uncertainty set, we can compute the posteriori uncertainty map by projection; see Definition 2. Evaluating at the observed data , yields a posteriori uncertainty set for , given . This set is given by
This set gives us a sense of how uncertain we are about the variables of interest, namely . However, it is generally required to come up with a point estimate. We define a point estimate by introducing a scaling variable for each constraint in the posteriori set (38). These scaling variable adjust the size of each set, so as to yield an estimate. The point estimate for , given , is defined as
The optimization problem in (39) is over all the variables and the scaling variables . However, as the output of the argminimization, we have only shown a subset of these variables, namely , for notational convenience.
To illustrate the point estimate generated by the optimization problem (39), and the result of scaling variables , we consider a simple example. Consider a Bayesian uncertainty network of four variables shown in Figure 3. Here, for all . The uncertainty set for is , and the conditional uncertainty maps for all , where denotes a square centered at with side length . The true value of the uncertainty , namely, and the set is illustrated in Figure 3.
We do not know the true value for , and wish to estimate it by observing the variables . Let be the observations of the uncertainty variables . Using these, we can construct a posteriori uncertainty set for , by evaluating the posteriori uncertainty map at . This gives the dark-red region shown in Figure 4, which is the posteriori uncertainty set.
To obtain the point estimate we introduce scaling parameters s, which scale the size of each of the red-colored rectangles in Figure 4, so that they intersect only at the boundary points. The estimate is shown in Figure 5. We see that the rectangle corresponding to the one ‘far away’ observation is enlarged, where as those corresponding to the other observations, that are more closer to one another, are shrunk. This is a process implicit in the definition of the point estimate (39), by which, in computing the point estimate, it weighs more in favor of observations that are closer to one another, than the one that is farther away.
Next, we show a relation between the point estimate and the MAP estimate. Before we proceed, we note that the point estimate defined in (39) is not unique, and depends on the functions used to represent the conditional uncertainty maps . For example, consider the specific case in which and for all . Let be any increasing function. Then, the posteriori uncertainty set in (38) can also be written as
Thus, the point estimate will now equal
which is different from (39). The choice of the functions , and , will have direct implication for the computational complexity of the estimate as well as the accuracy and robustness of the estimate. We leave this discussion for our future work.
In the next section, we show a relation between the point estimate defined here, for a Bayesian uncertainty network, and the MAP estimate of a canonical Bayesian network.
V-a Relation with MAP
In this section, we show a relation between the MAP and ML estimate of a Bayesian network, and the point estimate. A Bayesian network is a tuple of a collection of random variables and a DAG . For each , is associated a unique random variable in . Further, for each , a conditional probability density111We will restrict here to the case of continuous distributions for the ease of presentation. However, these results can be extended to discrete valued random variables as well. function is defined. The joint density function for is given by the product factorization
In what follows, we will use to denote the probabilities.
For a given Bayesian network , defined over the collection of random variables, we construct a canonical Bayesian uncertainty network , such that the underlying DAG is the same, and the functions and in (34) are given by
and , for all . Note that for all , , and therefore reduces to a function of just .
We now show that the point estimate for the canonical Bayesian uncertainty network, equals the MAP estimate for the corresponding Bayesian network.
For the canonical Bayesian uncertainty network ,
where denotes the probability density function of given .
See Apprndix -B.
This result shows that the point estimate indeed equals the MAP estimate for a canonically defined Bayesian uncertainty network. It is possible to show a similar relation between the point estimate and the maximum-likelihood estimate, by omitting certain constraints in (39). We leave this discussion for our extended work.
We developed a new framework of uncertainty variables to model uncertainty in the real world. We proved Bayes’ law and the law of total probability equivalents for uncertainty variables, and showed how this could be used in computing the posteriori uncertainty maps. We defined a notion of independence, conditional independence, and pairwise independence for a given collection of uncertainty variables. We showed that this new notion of independence preserves the properties of independence defined over random variables.
In the second part, we developed a graphical model over a collection of uncertainty variables, namely the Bayesian uncertainty network. This was motivated by the Bayesian network defined over a collection of random variables. A Bayesian network satisfies certain natural conditional independence properties, derived out of the graph structure. We showed that all the natural conditional independence properties, expected out of a Bayesian network, hold also for the Bayesian uncertainty network. We defined a notion of point estimate in a Bayesian uncertainty network, and proved that under a certain representational relation between the Bayesian uncertainty network and a Bayesian network, the point estimate equals the maximum a posteriori estimate.
In a follow up work, we develop other graphical models for uncertainty variables, and show the benefits of its applications in problems such as robotic perception, over some of the traditional approaches.
-a Theory of General Uncertainty Variables
In this section, we provide all the definitions needed for a general theory of uncertainty variables. We first define uncertainty maps and uncertainty variables.
An uncertainty map is defined as a set function that maps every point in the domain to a subset of either the same or another domain .
An uncertainty variable is a tuple , where is the domain of and is an uncertainty map.
We now define the marginal and conditional uncertainty maps, and the marginal uncertainty variable. Let be an uncertainty variable, where the domain is of the form . Let us define a projection operator of a set onto as follows:
A projection operator onto is a map