    # Non-deterministic inference using random set models: theory, approximation, and sampling method

A random set is a generalisation of a random variable, i.e. a set-valued random variable. The random set theory allows a unification of other uncertainty descriptions such as interval variable, mass belief function in Dempster-Shafer theory of evidence, possibility theory, and set of probability distributions. The aim of this work is to develop a non-deterministic inference framework, including theory, approximation and sampling method, that deals with the inverse problems in which uncertainty is represented using random sets. The proposed inference method yields the posterior random set based on the intersection of the prior and the measurement induced random sets. That inference method is an extension of Dempster's rule of combination, and a generalisation of Bayesian inference as well. A direct evaluation of the posterior random set might be impractical. We approximate the posterior random set by a random discrete set whose domain is the set of samples generated using a proposed probability distribution. We use the capacity transform density function of the posterior random set for this proposed distribution. This function has a special property: it is the posterior density function yielded by Bayesian inference of the capacity transform density function of the prior random set. The samples of such proposed probability distribution can be directly obtained using the methods developed in the Bayesian inference framework. With this approximation method, the evaluation of the posterior random set becomes tractable.

## Authors

05/29/2019

### Accelerating Monte Carlo Bayesian Inference via Approximating Predictive Uncertainty over Simplex

Estimating the uncertainty of a Bayesian model has been investigated for...
10/05/2015

### Bayesian Inference via Approximation of Log-likelihood for Priors in Exponential Family

In this paper, a Bayesian inference technique based on Taylor series app...
11/19/2019

### A Normal Approximation Method for Statistics in Knockouts

The authors give an approximation method for Bayesian inference in arena...
02/06/2013

### Support and Plausibility Degrees in Generalized Functional Models

By discussing several examples, the theory of generalized functional mod...
03/03/2021

### Parsimonious Inference

Bayesian inference provides a uniquely rigorous approach to obtain princ...
11/26/2019

### A continuation method in Bayesian inference

We present a continuation method that entails generating a sequence of t...
03/27/2022

### Robust and Efficient Bayesian Inference for Non-Probability Samples

The declining response rates in probability surveys along with the wides...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The inverse problem deals with the identification of the parameters in a computational model given some measurement data. It is typical that these parameters are not directly measured but rather their relating quantities which are observable. One can make a certain prediction about the parameters and evaluate the observable quantities. The difference between the predicted and the actual values of the observable quantities is a measure to evaluate how good a prediction is. There are two common methods to solve the inverse problem: deterministic and non-deterministic inferences. A deterministic inference targets the prediction that minimizes the different between the predicted and the measured values of the observable quantities. In a non-deterministic inference, the uncertainty – state of limited knowledge – about the parameters is updated based on the measurement data. For example, in Bayesian inference, the uncertainty is modelled and updated using random variables. A requirement to apply Bayesian inference is to formulate the prior uncertainty and the measurement error uncertainty with probability distributions [1, 2, 3]. However, in some situations, it could be difficult to derive a probability distribution that can express all the facets of a state of uncertainty. For example, prior knowledge can include nonspecificity, conflict, confusion, vagueness, biases, varying reliability levels of sources, and other types; and measurement data can contain noisy errors and also be coarsened [4, 5, 6].

Several methods were developed to model the uncertainty for different situations, for example: random variable (rv), set of possible values e.g. an interval set , set of probability distributions (probability box) [8, 5, 9], mass belief function in the Dempster-Shafer (DS) theory of evidence [10, 11], and random set (rs) [12, 13]. Each of these methods have their own advantages in the interpretation the uncertainty. In this paper, we focus on the rs theory. A rs is a set-valued rv, i.e. a map from the elementary probability space to the subsets of some domain. The first systematic treatment of the random (closed) set is Matheron . The theory was then developed much further by Molchanov in  and by Nguyen in . The random set theory is a generalisation of the other listed uncertainty descriptions. Indeed, it is obvious that rv is a special case of rs. In the cases that the map of a rs points to a deterministic set, it becomes the set of possible values. In the context of the evidence theory, the mass belief function can be formulated via a map from a rv to subsets of some space. The mass belief function is hence a rs. Lastly, the probability box can be represented using a rs resulted from the union of the inversion of distribution functions belonging to that box. Thanks to that generality, the rs is flexible in modelling different descriptions of uncertainty, while the mathematical formulation remains unchanged.

In this paper, we develop a non-deterministic inference framework in which random sets are used to model uncertainty. Since rs theory can formulate all the other listed uncertainty descriptions, such inference framework can be applied for these cases as well as their combinations. The proposed inference is described in short as: the posterior rs is the intersection between the two input random sets: the prior rs and the rs induced by the measurement data. We shall show later in the paper that the proposed inference is a generalisation of Bayesian inference, i.e. when the prior rs is simplified to be a rv, the posterior rs is the posterior rv yielded by Bayesian inference. While in the cases that the two input random sets are expressed using evidence theory, the proposed inference method is identical to the Dempster’s rule of combination . Furthermore, if the input random sets are deterministic sets, the proposed inference yields the intersection of these sets as expected.

Although the proposed inference rule is quite simple, the computation of its posterior rs is problematic. Likewise the Bayesian inference, for a complex computational model, a sampling method is applied to characterize the posterior rs. A direct method to determine the set-valued samples, i.e. to solve optimization problems in order to identify each set-valued sample, might be unpractical. In this paper, the posterior rs is approximated using a random discrete set whose domain is the set of samples of a proposed distribution. Once the samples of that distribution together with their computational model responses are available, the set-valued samples of the random discrete set are easily identified, and no optimization process is required. Given the set-value samples of the random discrete set, the characteristics of the posterior random sets, e.g. its distribution function and its set-valued expectation, can be estimated.

The choice of the proposed distribution is crucial in order to achieve a good approximation while the computation efficiency remains acceptable, e.g. being comparable with the sampling methods in the framework of Bayesian inference. In this work, the capacity transform density function of the posterior rs, denoted as

, is presented and used as the proposed probability density function (pdf). Its has a nice property: the pdf

is exactly the posterior pdf yielded by Bayesian inference that updates the capacity transform pdf of the prior rs. In other words, we do not need to compute the posterior rs in advance, and then evaluate its capacity transform pdf . Inversely, one can sample the pdf

directly using Bayesian inference, and use the obtained samples to approximate the posterior rs. The method developed in the framework of Bayesian inference, e.g. Markov chain Monte Carlo (MCMC)

[15, 16], inversion via conditional expectation [18, 19], can be directly applied to sample the pdf . Furthermore, since is a characteristic of the posterior rs, the required number of evaluations of the computational model is smaller than when using other non-informative proposed pdfs while producing a same level of approximation.

The rest of the paper is organized as follows: In Section 2, the background of rs theory is summarized. The relations of rs theory to the evidence theory (together with possibility theory), and to the probability box are also discussed. In Section 3, the inference method dealing with random sets is given. We shall show that the proposed inference method agrees with the Bayesian one when the prior rs is simplified to be a rv. In Section 4, the approximation of the posterior rs using a random discrete set is discussed. In Section 3.3, the capacity transform pdf of the posterior rs is defined and is chosen as the proposed pdf. The sampling method of the discrete rs that approximates the posterior rs is then developed. In Section 5, the method to estimate the set-valued expectation of the posterior rs is given. The developed methods are illustrated through a numerical example in Section 6. The paper is concluded in Section 7.

## 2 Background of the random set theory

In this section, the background of the rs theory is summarized. Details can be found in e.g. . Because the family of sets is rather rich, it is common to consider random closed sets which include the case of random singletons.

### 2.1 Random sets

Let be a complete probability space, where the set of elementary events is , is  -algebra of events, and is probability measure. The set of closed subsets of is denoted by . A rs is defined as a set-valued measurable map given as

 X:Ω→X. (1)

For the sake of simplification, we consider in this paper only integrally bounded random sets, i.e. is bounded, where is the expectation operator. When the set is singleton set, i.e. it has only one element for all , the rs

is a (vector valued) rv. The rs is hence considered as a generalisation of rv. Two measures, the rs distribution (RSD)

and capacity functional of a rs , are defined as where is -algebra of the set such that

 PX(X)=P({ω|X(ω)⊂X}), (2)

and

 TX(X)=P({ω|X(ω)∩X≠∅}), (3)

where is a measurable set. One can directly obtain that

 0≤PX(X)≤TX(X)≤1. (4)

 TX(X1∪X2)≤TX(X1)+TX(X2),PX(X1∪X2)≥PX(X1)+PX(X2), (5)

where such that . It is remarked that a probability distribution function is additive. When and are identical, they are a probability distribution function.

A rv is a selection rv of the rs is if almost surely. The probability distribution of a selection rv satisfies

 TX(X)≥P(X)≥PX(X). (6)

The set of all selection random variables is denoted as . Since the rs is assumed to be integrally bounded, all the selection random variables of the rs are first order rv. The selection expectation of rs is the closure of the set of all expectations of integrable selection random variables, i.e.

 ES(X)=cl{E(x(ω))|x(ω)∈S} (7)

where is the expectation of the selection rv , and cl is the closure operator.

In the rest of this section, the relations of the rs to the theory of evidence (together with possibility theory), and to the probability box are discussed. These two methods are usually applied to model the uncertainties in multi-experts systems or data that contain both epistemic and aleatory errors [5, 6, 20, 9].

### 2.2 Evidence theory and possibility theory

In the evidence theory, a belief mass function is defined over , , such that and
Two measures: belief measure and plausibility measure of a measurable set are respectively defined as

 Bel(X):=∑X′∈XM(X′)1(X′⊂X), (8)

and

 Pl(X):=∑X′∈XM(X′)1(X′∩X≠∅), (9)

where is a logical operator that yields the unit value if the condition expressed inside the brackets is true, and zero otherwise. When is a consonant mass function on a finite space , is a possibility measure in the possibility theory . Inversely, there is a consonant mass function such that the possibility measure is the plausibility function corresponding to , (Theorem 2.5.4 page 42 in the reference ).

One important ingredient of the evidence theory is the Dempster’s rule of combination. That combination rule is summarized in Appendix A. Based on that rule, we develop the inference method discussed in Sec. 3.

##### Random set representation of the mass belief function.

Let be the set of alls subsets such that , and be a uniform rv . Let be a rs defined as

 X(ω)=Xiifi−1∑k=1M(Xk)<η(ω)≤i∑k=1M(Xk),

The distribution function and the capacity functional of that rs are identical to the belief function, and plausibility function respectively, i.e.

 PX(X)=Bel(X),TX(X)=Pl(X). (10)

### 2.3 Set of probability distributions

A way to describe a set of possible probability distributions of a rv is to define the upper and lower bounds on the cdf [5, 23]. Such expression of the set of possible probability distributions is called probability box. Here, we consider only the cases that the components of the vector are statistical independent. Let and be the upper and the lower bounds of the cdf, such that . The cdf of the considered rv follows the constraint

 \lx@converttounder¯F(x)≤F(x)≤¯F(x). (11)
##### Random set interpretation of the probability box

A rs can be constructed from cdfs and as

 Xi(ω)=[¯F−1i(ηi(ω)),\lx@converttounder¯F−1i(ηi(ω))],i=1,…n, (12)

here we abuse the notation and redefine it as the uniform rv in . We have

 PX((−∞,x))=\lx@converttounder¯F(x),TX((−∞,x))=¯F. (13)

## 3 Interference in the context of random set theory

In this section, we consider the inference problem in which prior uncertainty is represented using a rs . In addition, in order to account for noisy errors and coarsening effects of measurement data, their information is also modelled by a rs . That inference problem is explained in Sec. 3.1. The proposed method to update the prior rs given the rs is discussed in Sec 3.2. That inference method is based on the Dempster’s rule of combination and agrees with Bayesian inference when the prior random set is simplified to be a rv. Furthermore, the proposed method is also linked to Bayesian inference via the capacity transform pdf of random sets. This issue is discussed in Sec 3.3.

### 3.1 Inference problem

Inference problem deals with the identification of parameters, denoted as , given measurements of other quantities such that the relation between and can be represented using a computational model , i.e.

 z=h(x). (14)

In practice, the measurement noise is inevitable. Assuming that the noise is additive, the actual measured value is given as

 ~z=h(x)+ϵ, (15)

where is the actual value of the noise happened when performing the measurement. Furthermore, we deal with the problem that the measurement data do not give directly the value of but a set , e.g. an interval set, such that

 ~z∈Z. (16)

That description of measurement data can be encountered in practice when the accuracy of measurement devices, e.g. sensing resolution and/or minimum (maximum) detectable values, are not negligible . Since the actual value is uncertain, it is modelled as a rv . The r.s. of induced by that measurement setup is given as

 Xd(ω)={x∈X|h(x)+ϵ(ω)∈Z}. (17)

It is remarked that the developed method in this work is still applicable when the uncertainty of is modelled as random set.

### 3.2 Inference rule

In this section, we develop an inference method to update the prior rs given rs . From Section 2, there are two ways to interpret a rs: (i) as a set of selection random variables, and (ii) as a set-valued rv. Under the former interpretation, one possible non-deterministic inference method is to apply Bayesian inference independently to each selection rv. In this work, we propose an inference method using the latter interpretation. It is described as: the posterior rs is the intersection of the prior and the measurement induced random sets. This proposed interference method is based on the Dempster’s rule of combination  summarized in Appendix A. Under the first inference method, a rs is treated as the set of independent probability distributions, while the proposed method considers the rs as a single piece of information. The updated result of the former is less informative than the latter (see Theorem 3.6.6 page 94 in the reference ). Such comparison of the two methods are illustrated on a simple problem reported in the Appendix B. The inference method based on Dempster’s rule of combination is given in the follow.

###### Definition 1 (Interference of a rs using Dempster’s rule of combination).

The update of the prior rs given the measurement induced rs is a posterior (updated) rs defined as

 Xa(ω):=X(ω)∩Xd(ω), (18)

and probability is updated as

 Pa(dω)=P(dω)1(Xa(ω)≠∅)1−K. (19)

where is the degree of conflict and given by

 K=1−∫Ω1(Xa(ω)≠∅)P(dω). (20)

The function in Eq. (19) is interpreted as a likelihood function. The update of is required to rule out empty sets, i.e. , while the normalised property, , is conserved. The larger the value , the more significant the conflict between the prior knowledge and the measurement data becomes. If , then the prior and the measurement induced random sets are said to be in total conflict and no interference is possible. The updated rs is simplified to be rv in following cases: the prior rs is a rv; or the set has only one member and the function is strictly monotonic on almost surely. Furthermore, it is observed that . Hence, in a sequential update , i.e. the becomes the prior rs when new data are available, the final updated rs might also be simplified as a rv.

### Relation with Bayes’s rule.

We show in the following that the proposed interference method of rs agrees with Bayesian inference when the prior rs is a rv. In this case, the prior is modelled as a rv, denoted as , the update method using Dempster’s rule yields a posterior rv given by

 xa(ω)={x(ω)ifx(ω)∈Xd(ω)∅otherwise, (21)

where the probability is given as

 Pa(dω)=P(dω)1(xa(ω)≠∅)∫Ω1(xa(ω)≠∅)P(dω). (22)

The following theorem shows that the r.v. is the Bayesian update of the prior r.v. .

###### Theorem 1.

The pdf of the rv defined in Eq. (21) is the posterior pdf yielded using the Bayes’s rule as

 πa(x)=π(x)L(x)∫Xπ(x)L(x)dx, (23)

where is the pdf of the prior rv , and is the likelihood function

 L(x)=∫Ω1Xd(ω)(x)P(dω), (24)

where

is the characteristic function.

It is noted that to avoid Borel–Kolmogorov paradox, i.e. is undefined, we consider here the case that the random set satisfying , where is the set of interior points of , almost surely. For the special case that the rs is a rv, we mention it explicitly. This assumption is also applied for random sets , and .

The proof of Theorem 1 is given in Appendix C. Theorem 1 shows that the proposed inference agrees with the Bayesian method when the prior uncertainty is modelled using a probability distribution. We shall show later in Section 3.3 that the likelihood function defined in the Eq. (24) is proportional to the capacity transform pdf of the measurement induced rs .

### 3.3 Capacity transform density function

The capacity transform pdf of rs is defined as

 πT(x)=∫Ω1X(ω)(x)P(dω)∫X∫Ω1X(ω)(x)P(dω)dx. (25)

For to be well-defined, it is required that . In the context of DS theory of evidence, this pdf is called plausibility transform pdf [24, 25]. When is a rv , the capacity transform pdf of the random set –closed balls centred at and having radius –converges to the pdf of as .

In a similar way of deriving , the capacity transform pdf of the updated rs given in Eq. (18) is defined as

 πaT(x)∝∫Ω1Xa(ω)(x)Pa(dω). (26)

The capacity transform pdf of the measurement induced rs , see Eq. (17), is derived as

 πdT(x)∝∫Ω1Xd(ω)(x)P(dω)=∫Ω1Z(h(x)+ϵ(ω))P(dω). (27)

It is remarked that the right hand side of Eq. (27) is the likelihood function defined in the Eq. (24). The relation of to and (or ) is given in the following theorem.

###### Theorem 2 (Capacity transformed pdf of the posterior rs).

The capacity transformed pdf of the posterior rs is the posterior pdf obtained by Bayesian inference with prior pdf and the likelihood function given by Eq. (24), that is

 πaT(x)=πT(x)L(x)∫XπT(x)L(x)dx. (28)
###### Proof.

Inserting the expressions of in Eq. (18) and in Eq. (19) into Eq. (26) we have

 πaT(x)∝∫Ω1X(ω)(x)1Xd(ω)(x)P(dω). (29)

As and are independent, we have

 ∫Ω1X(ω)(x)1Xd(ω)(x)P(dω)=∫Ω1X(ω)(x)P(dω)∫Ω1Xd(ω)(x)P(dω)=πT(x)L(x). (30)

The expression of can be rewritten as in the Eq. (28). ∎

Using Theorem 2, it is not required to compute explicitly in advance to evaluate the updated capacity transform pdf . Instead, we can sample the pdf directly using Bayesian reference, and these samples are then used to approximate the posterior rs in Section 4.

## 4 Approximation of the posterior rs using a random discrete set

To reduce the computational burden, we approximate the posterior rs by a random discrete set. Instead of searching for all members of the set , we limit them only to be elements of a discrete set , which are generated from a proposed pdf over the domain , such that almost surely. There are several ways to choose the proposed pdf

. For example, one can use a uniform distribution (if

is bounded), or an unbounded distribution with a large (co)variance (if

is unbounded), or the distribution of a selection rv of prior rs. In these examples, the choices of are non-informative since they do not account for the measurement data. Here we use the pdf as the proposed pdf . With this choice for , we have an informative proposed pdf, while the computational methods that are well-developed in the framework of Bayesian inference can be directly applied to obtain pdf using Theorem 2.

The approximation using the random discrete set, denoted as , is formulated as

 Xa(ω)≈^Xa,κ(ω):=Xa(ω)∩{x(1),…,x(κ)}. (31)

Using the definition of in Eq. (18), its approximated set can be expressed as

 ^Xa,κ(ω)={x(i)∈{x(1),…,x(κ)}|x(i)∈X(ω)∩Xd(ω)}. (32)

The larger the number , the better the approximation. Such approximation is verified by the following Theorem.

###### Theorem 3 (Discrete set approximation of bounded set using samples of a probability distribution).

Let be a bounded set that contains no isolated point and be a pdf such that , and be its samples, then the Hausdorff distance between the set and the set converges to 0 as almost surely.

It is remarked that the Hausdorff distance between two sets is zero only if they have an identical closure. The proof of Theorem 3 is given in the Appendix D.

### 4.1 Sampling method for the posterior random set

The samples of the pdf can be generated using classical methods in the Bayesian inference framework. In this work, we use the Metropolis Hasting MCMC [15, 16] algorithm for this task. Note that the algorithm provides not only the samples but also their model responses .

##### MC simulation of the random discrete set ^Xa,κ.

Given the samples and their model responses, a MC simulation is then applied to obtain the samples, where , of the approximated random discrete set expressed in Eq. (32). This MC simulation is reported in Algorithm 1.

From the samples, where , the RSD and the capacity functional of the posterior rs can be approximated respectively as

 PXa(X)≈1NN∑i=11(^Xa,κ(i)⊂X),TXa(X)≈1NN∑i=11(^Xa,κ(i)∩X≠∅). (33)

Remarks: The evaluation of the model is only required for the MCMC algorithm to sample , but not in the later MC simulation summarized in Algorithm 1. In other words, the latter MC simulation is independent of the complex of the model . Therefore the computational cost of our method is comparable with the classical methods used for the Bayesian inference. With the proposed method, we do not need an optimization process to find member of . Furthermore, when becomes a rv, its pdf is the capacity transform pdf , and therefore are its samples.

## 5 Set-valued selection expectation of posterior random set

In this section, the support function of a given set is introduced. We use these functions as the mean to evaluate the set-valued selection expectation of the posterior rs (the definition of selection expectation is formulated in Eq (7)).
A support function is defined as

 γ(X,ν)=supx∈X ν⋅x. (34)

where is a vector on the unit sphere and is the scalar product. Applying that support function to the rs , we obtain the scalar-valued rv .

###### Theorem 4.

If the basic probability space is non-atomic, the selection expectation is a convex set, and

 γ(ES(Xa),ν)=E(γ(Xa,ν)). (35)

The proof of Theorem 4 can be found in the Chapter 2 of reference  (Theorem 1.26). Thanks to Theorem 4, the set-valued selection expectation can be obtained via the probabilistic expectation of rv , e.g. using MC method. From samples of random discrete set of obtained using the Algorithm 1, the expectation of rv can be evaluated as

 E(γ(Xa,ν))≈E(γ(^Xa,κ,ν))=limN→∞1NN∑i=1γ(^Xa,κ(i),ν). (36)

From the Theorem 4, if the elementary probability space is non-atomic, the set-valued expectation of the posterior rs can be identified as,

 ES(Xa)=∩ν∈Sn−1{x∈X:ν⋅x≤E(γ(Xa,ν))}. (37)

## 6 Numerical example

### 6.1 Problem setting

To illustrate the developed method, the truss system, see Fig. 1, is considered. For the sake of simplification, the inference is performed on two parameters: the stiffness of the horizontal beams, and the applied forces . In terms of notation, these parameter are sorted into the vector , i.e. . We fix the other parameters, i.e. the bar cross area, the stiffness of the diagonal bar, as constants. We shall use the virtual measurement data of the nodal vertical displacements to perform the inference.

The truss system can be solved using a finite element (FE) model of bar elements as

 u=[A(x)]−1f(x) (38)

where is the vector of nodal displacements, is the stiffness matrix depending on , and is the nodal vector of applied forces . That FE model is represented as a function , i.e. .

##### Prior rs

The prior rs of is expressed as:

• the rs of the (dimensionless) stiffness is expressed using a probability box where the upper and lower bounds of its cdf are the lognormal distributions , respectively, where is the mean, and is the variance;

• the randomness of the (dimensionless) applied force is expressed using mass belief function as: the possible events are , , , and their masses are given as and , .

We also assume that they are independent. The prior rs of can be encountered in practice as the bounds of prior cdf. While the prior rs of might result when collecting information from different sources in which information are expressed using intervals.

Remark. The prior description of the uncertainties of the parameters is a mix of probability box and mass belief function. However, under the umbrella of a rs, their formulations are similar. That is one advantage when working with rs.

##### Virtual measurement data

We set a vector of parameters as the truth and perform measurements virtually, i.e. the equation (38) is solved with to obtain the displacement vector . The vector is then perturbed by adding the random errors

which are modelled following Gaussian distributions,

, and are assumed to be independent. The observation sets are derived to model the sensing resolution of measurement devices, which is assumed to be a unit in this example. The virtual measurement data of the displacements at the points a, …, a, see Fig. 1, are reported in Tab. 1. We consider two cases: (i) the inference is performed based on one measurement datum at points a, and (ii) the inference is performed using all the virtual measurement data.

### 6.2 Numerical results

#### Samples of the updated capacity transform pdf πaT

The marginal capacity transform pdf of the prior rs of , , can be evaluated as

 πTE(x1)∝¯FE(x1)−\lx@converttounder¯FE(x1) (39)

where and are its upper and lower cdf bounds of elastic modulus . The marginal capacity transform pdf of the prior rs of , , is evaluated as,

 πTq(x2)∝3∑i=1MQ(Qi)1Qi(x2). (40)

As and are independent, . The likelihood function, see Eq. (24), in this case is simplified as

 L(x)=nd∏i=1(Fϵ(ui(x)−\lx@converttounder¯zi)−Fϵ(ui(x)−¯zi)), (41)

where is number of measurement data used, is the cdf of Gaussian distribution , is the vertical displacement at the point a computed using FE model in Eq. (38), and , where is the observed interval of the displacement at point a, see Tab. 1.

Using a MCMC simulation, the samples of the posterior capacity transform pdf are obtained. From these samples, the pdf is estimated and illustrated in Fig. 2. It is observed that the updated capacity transform pdf converges to the truth parameter vector when more data are involved as expected. Note that at this step, the model responses where are also obtained.

#### Samples of the discrete random set ^Xa,κ

Following the approximation method developed in Section. 4, instead of directly sampling the posterior rs defined in Definition 1, we sample its approximation, i.e. the random discrete set . Using sampling values of and where from MCMC simulation, the set-valued samples of the random discrete set are obtained following the Algorithm 1. No further evaluation of the FE model is required at this step.

From the samples , the upper and lower cdf bounds, i.e. and respectively, are evaluated using Eq. (33). The obtained marginal cdf bounds of the posterior rs of are shown in Fig. (3). As it is observed in Fig. 3, the bounds of cdfs are shrinked after updating. This is explained by the fact that is a subset of , see Eq. (18). The more data become available, the thinner these bounds. Futhermore, the updated random sets get closer to the truth value similarly with the Bayesian inference of rv. (a) Marginal cdf of E (one measurement) Figure 4: The boundaries of the selection expectations of prior and posterior random sets obtained in two cases: using one measurement datum (at a1), and using all eleven measurement data.

Based on the samples , we compute the boundary of the selection expectation set defined in Eq. (7). This task requires the evaluation of the support functions . In this example, we use a non-atomic elementary probability space. Indeed, while a non-atomic elementary probability space can model both the prior random sets of and , an atomic one can only model the rs of . Following the Theorem 4, the selection expectation set is convex, and the support functions can be computed via the probabilistic expectation of rv following Eq. (35). From the samples , the approximation of the expectation is evaluated using Eq. (36). The selection expectation boundaries belonging to the prior and the posterior random sets are illustrated in Fig. 4. It is observed that, when more measurement data become available, the selection expectation moves toward the true value. It also shrinks in an agreement with the shrinking of the cdf bounds in Fig. 3.

#### Convergence analysis

To investigate the convergence of the approximation expressed in Eq. (31), one can check the mean square error when approximating the expectation of the support function by , see Eq. (36). That mean square error is defined as

 MSEν(κ)=(∫Xκ[E(γ(Xa,ν))−E(γ(^Xa,κ,ν))]2κ∏i=1πaT(x(i))dx(i))1/2 (42)

Because is unknown is approximated as

 MSEν(κ)≈(∫Xκ[E(γ(^Xa,κ∞,ν))−E(γ(^Xa,κ,ν))]2κ∏i=1πaT(x(i))dx(i))1/2, (43)

where . The integration of the mean square error can be computed using MC method.

The normalized mean square errors of different values are illustrated in Fig. 5. It can be observed that with samples of the pdf , the normalized mean square error is smaller than 3%. Figure 5: The normalized mean square error MSEν for different values of ν (the illustrated case is when all data are used for the inference).

## 7 Conclusion

In this work, we develop a framework for the non-deterministic inference using random set models. The inference rule is based on Dempster’s rule of combination. We show that the proposed reference is a generalisation of the Bayesian one. The posterior rs is approximated using random discrete set whose the domain is the set of samples of a proposed distribution. The capacity transform pdf of the posterior rs is chosen as the proposed distribution. With this choice, the required samples of the proposed distribution can be obtained using the methods developed in Bayesian framework, e.g. MCMC. The computational burden is hence comparable with those methods in Bayesian inference.

A rs can equivalently formulate other uncertainty modelling methods, e.g. rv, set of possible values, mass belief function in the evidence theory, and probability box. Therefore, the developed inference framework can be applied for all these cases as well as their combinations. We have demonstrated this advantage in a numerical example in Section 6.

Since the computation method for the set-valued selection expectation has been developed, the framework can be extended toward decision making theory. In addition, the special property of the capacity transform pdf is promising when dealing with data assimilation involving random sets.

## Appendix A Dempster’s combination rule

Let and be two mass belief functions, see Section 2.2 for their definition. The Dempster’s rule to combine the two belief mass functions and is given as

 Ma(X)=11−K∑Xi∈XM,Xj∈XMdM(Xi)Md(Xj)1(Xi∩Xj=X) (44)

where , and is a measure of the amount of the conflict between and

 K=∑Xi∈XM,Xj∈XMdM(Xi)Md(Xj)1(Xi∩Xj=∅)

In Dempster’s combination rule, a non empty set has a positive belief mass after updated, i.e , only if there exist a pair , such that . The inference methods described in Definition 1 is equivalent to Dempster’s combination rule when the random sets and are resulted from the mass belief functions.

## Appendix B Comparison of two inference strategies of rs on a simple example

In this section we compare the two inference strategies: (i) using Dempster’s rule as discussed in Section 3 and (ii) using Bayesian interference on the set of selection random variables. Let , . The rs represents the prior knowledge about the variable is given by

 X(ω1)=X1,X(ω2)=X2. (45)

Assuming that we have a direct measurement about which give us the following information where and . The rs induced by this measurement is

 Xd(ω)={x1,x2}∀ω∈Ω.

### b.1 Inference using Dempster’s rule

Using Dempster’s rule, the updated posterior rs is obtained as

 Xa(ω1)=x1,Xa(ω2)=x2. (46)

In other words, the sets and shrink to become and respectively. The rs is a rv, and its distribution function is given as

 PXa(x1)=P(ω1),PXa(x2)=P(ω2). (47)

### b.2 Inference using Bayes’ rule on the set of selection random variables

Let be a probability distribution of a selection rv . Using Bayes’s rule, the updated distribution of the prior distribution is given as

 Pxa(x1)=Px(x1)Px(x1)+Px(x2),Pxa(x2)=Px(x2)Px(x1)+Px(x2)ifPx(x1)+Px(x2)>0. (48)

In cases that , no update is possible. These are two special cases,

 Pxa(x1)=1,Pxa(x2)=0,ifPx(x1)>0,Px(x2)=0, (49)

and

 Pxa(x1)=0,Pxa(x2)=1,ifPx(x1)=0,Px(x2)>0. (50)

Based on that update, the posterior knowledge is represented as

 0≤Pxa(x1),Pxa(x2)≤1,andPxa(x∉{x1,x2})=0. (51)

This conclusion is independent on the prior probability