    # Assessing forensic evidence by computing belief functions

We first discuss certain problems with the classical probabilistic approach for assessing forensic evidence, in particular its inability to distinguish between lack of belief and disbelief, and its inability to model complete ignorance within a given population. We then discuss Shafer belief functions, a generalization of probability distributions, which can deal with both these objections. We use a calculus of belief functions which does not use the much criticized Dempster rule of combination, but only the very natural Dempster-Shafer conditioning. We then apply this calculus to some classical forensic problems like the various island problems and the problem of parental identification. If we impose no prior knowledge apart from assuming that the culprit or parent belongs to a given population (something which is possible in our setting), then our answers differ from the classical ones when uniform or other priors are imposed. We can actually retrieve the classical answers by imposing the relevant priors, so our setup can and should be interpreted as a generalization of the classical methodology, allowing more flexibility. We show how our calculus can be used to develop an analogue of Bayes' rule, with belief functions instead of classical probabilities. We also discuss consequences of our theory for legal practice.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction and motivation

It has been debated for several decades as to what extent theories of probability, are useful and/or suitable for assessing the value of evidence in legal and forensic settings, see e.g. , , , , . The debate mainly concentrates on the question whether or not the classical theory of probability, by which we mean the theory following Kolmogorov’s axioms, is suitable in legal problems.

The current dominant view proposes that we should use the classical probability axioms in court, in particular the axiom of additivity, i.e.  whenever and are disjoint (, , , , ). We are typically interested in an event, often denoted by , that a given individual is the donor of a DNA profile, is the criminal in a certain crime, is the father of a certain child, or likewise. The main tool in this dominant view is Bayes’ formula

 P(G|E)P(¯G|E)=P(E|G)P(E|¯G)⋅P(G)P(¯G), (1.1)

where denotes relevant evidence, and where denotes the complement of (or another set such that ).

Bayes’ formula transforms prior odds

into posterior ones by multiplying the prior odds by the likelihood ratio . We will give more detail on the legal practice and the use of Bayes’ rule in Section 6, but note that the use of Bayes’ formula presupposes the idea that all quantities of interest, including the prior, can indeed be expressed as probabilities satisfying the usual Kolmogorov axioms.

The alternative view insists that classical probability theory, with the axioms of Kolmogorov, is in many cases not suitable to be used in court or in forensics, for various reasons. We adhere to this alternative view, since we believe that there are situations in which the axioms of Kolmogorov are too restrictive, and we start by giving a number reasons and examples to support this claim.

First, it has been observed by many that the classical theory cannot distinguish between lack of belief and disbelief. Here, disbelief is associated with evidence indicating the negation of a proposition, whereas lack of belief is associated with not having evidence at all. As Shafer  puts it, the classical theory does not allow one to withhold belief from a proposition without according that belief to the negation of the proposition. When we want to apply a theory of probabilities to legal issues, this becomes a relevant issue. Indeed, if certain exculpatory evidence in a case is dismissed, then this may result in less belief in the innocence of the suspect, but it gives no further indication for guilt.

The second shortcoming of the classical theory is its inability to model complete ignorance within a given population. We first give two examples, and then elaborate on this issue.

###### Example 1.1.

In The Netherlands, a well known court case concerned a traffic accident caused by a car with two passengers. Although it was not disputed that the car caused the accident, it was unclear which of the two passengers was driving. The classical solution to deal with this, is to impose a fifty-fifty prior on the two passengers, but this is in fact not corresponding to reality. In reality we know that one of the two passengers drove, but we are otherwise ignorant. This cannot be modeled with classical probability.

This example can be generalized into the well known and classical island problem:

###### Example 1.2.

In the classical version of the island problem (see e.g.  and 

) a crime has been committed on an island, making it a certainty that an inhabitant of the island committed it. In the absence of any further information, the classical point of view is to assign a uniform prior probability over all inhabitants concerning the question who is the culprit. The combination of assigning probability 1 to the collection of all inhabitants and probability 0 to each individual is impossible under the classical axioms of probability, although this may be exactly the prior one needs and wants to impose.

The last two examples may need some elaboration since it may not be so obvious why ignorance cannot be properly modeled by a uniform distribution over all possibilities.

Firstly, when we look at the island problem in Example 1.2, it is simply the case that we do not have information pointing to any individual. We do have group information, but no individual information. With a uniform distribution over the group, you nevertheless make a statement about each individual. This is very relevant in legal cases since these are against individuals, not against a whole population.

Secondly, it is simply not the case that a uniform distribution does not convey any information. Even in a frequentistic context, a uniform distribution tells us something when we repeat the experiment many times. Or, to phrase the same point differently, having probability for head to come up in a coin flip, is information.

Finally, an uninformative prior leads to different results than a uniform prior in our theory, as we will see in Section 3. The fact that these priors lead to different results confirms that these priors are really distinct: a uniform prior is not a prior representing ignorance, and using a uniform prior does not lead to the same results as using a prior that does represent ignorance.

The examples suggest that the usual axioms of probability may not always be appropriate in legal and forensic settings. In particular, there are at least two (related) problems: (1) the additivity of probabilities, that is, if and are disjoint, is not always desirable, and (2) there is no way to model ignorance in a given population in the classical theory. We are not the first ones to observe this, of course. Already back in the seventies of the previous century, there have been at least two major attempts, by Cohen  and Shafer  respectively, to develop a theory of probabilities, or generalizations thereof, for legal settings outside the realm of the axioms of Kolmogorov. However, both these attempts have been criticized fiercely (references below), for various and different reasons, and nowadays they are not used at all in assessing evidence in legal settings.

This history fits in a classical pattern in science at large. There is a certain theory (in our case classical probability axioms) which is supposed to describe or explain certain phenomena (in our case dealing with uncertainty in legal and forensic context). Then certain problems or anomalies arise (for our case, see the examples above). Nevertheless, despite these anomalies, the theory is often upheld, often mainly by the lack of an acceptable alternative. In our case, the proposed alternatives of Cohen and Shafer seemed to have too many problems in themselves, and as such they were not acceptable and the classical theory prevailed. The philosopher Thomas Kuhn described this generic process for instance in , together with many examples from the history of science.

We think that Shafer’s approach is the most promising when it comes down to application possibilities and general acceptance, since it is conceptually simpler and closer to classical probability than the approach of Cohen. This is the reason that in this article we restrict our attention to Shafer’s approach.

Before we introduce belief functions properly, we should review for what reasons Shafer’s belief functions have been essentially ignored in the forensic and mathematical literature, other than being criticized. To be sure, they are not completely ignored, but they are typically framed as somewhat of a curiosity and not taken very seriously, for instance in Dawid’s otherwise excellent notes . We list three main points of concern that can be found in the literature, and briefly indicate our position.

First of all, Shafer himself reports in  that many of his critics rejected his belief functions because of the lack of a suitable betting interpretation. In other words, it is the interpretation that seemed to be an obstacle. However, this criticism does not target the theory, but only the development of the theory. Only if it turns out that a suitable betting intepretation is not possible, it would be a basis to reject the theory. In this light, this criticism should only be seen as a request to further solidify the underpinnings of the theory by a betting interpretation. Shafer argues in  that no such behavioral interpretation is necessary. We do think it is a legitimate request and note that in our companion paper to the current one , we do formulate a very natural betting interpretation of Shafer’s belief functions.

A second, and much more important point of concern about Shafer’s belief functions can be found for instance in  and . In both references, the main reason to reject Shafer’s belief functions is that the calculus of these belief functions, as put forward in the so called Dempster rule of combination, is arbitrary, not well founded and therefore unacceptable. This rule is supposed to describe how different belief functions should be combined into a new one. In the current article, however, we do not use Dempster’s rule of combination. The only thing we need is the Shafer-Dempster conditioning, which we motivate without deriving it from Dempster’s rule of combination and is much less controversial, if at all. Hence the current article, is consistent with any stance on Dempster’s rule of combination. We do note, however, that we do have an opinion on the matter; we in fact reject Dempster’s rule, and motivate this in our companion paper .

A third point of concern is articulated, again, in . The very fact that a belief function can allow zero belief to both an event and its complement, makes it, according to , inadequate to be used in legal matters, where a decision has to be taken, and where not making a decision is not an option. However, we do not think that this objection is well founded, since it seems to mix up the notion of belief with the act of making a decision about declaring someone guilty. Based on a belief function which assigns belief zero to both guilt and innocence, a suspect will not be convicted. Actually, belief functions seem to do more justice to the situation, since a judge will make a certain decision only if there is enough evidence. One should only convict someone if the belief in the hypothesis that he or she is actually guilty, is high enough, and this does not seem to have anything to do with the fact that the belief in a proposition and its complement can both be zero.

Finally, in  it is questioned whether or not belief functions respect some ‘rules’ of reasoning when it comes to knowledge. For the most part, this criticism does not apply to our theory and how we want to use it, and we hope to convince the reader that there are many situations in which the use of belief functions is very reasonable, by supplying a number of examples.

The goal of the present article is to convince the forensic and legal communities of the fact that Shafer’s belief functions can be put to good use in legal and forensic matters, by using a calculus without Dempster’s rule, thereby taking away the main obstacle for the use of Shafer’s theory and providing the communities with an acceptable extension of the classical theory. We do this by computing (conditional) belief functions in classical problems like the island problem and parental identification problems. Furthermore, we show through examples how we can use an analogue of Bayes’ formula. Shafer’s belief functions can take care of the problems noticed in the two examples above. Indeed, they are so general that they can distinguish between lack of belief and disbelief, and they are also flexible enough to be able to model complete prior individual ignorance.

We want to stress that the belief functions of Shafer are a generalization of classical probabilities, and that everything that can be modeled with classical probability theory, can therefore also be modeled with these belief functions. If there is a good reason to take a classical informative prior, the theory allows for that. We lose nothing. As expected, having no prior information leads to different outcomes. This is only reasonable we think, since the classical procedure imposes a prior which is legally problematic.

Although we do address and resolve the problems with the classical theory that we mentioned above, this does not mean that the theory we present solves all problems. The difficulty of quantifying evidential support is just one example of a problem we are still facing in our theory, as is the choice of the relevant population. Despite this, we think the theory is a significant step forward, and perhaps it is hard to imagine how a mathematical theory would satisfactorily solve these issues anyway.

On the theoretical part we will in this article be as brief as possible, but we make sure that the current paper remains self-contained. An in-depth and extensive theoretical development is carried out in the companion paper to the current paper in 

, including an extensive discussion of betting interpretations, a law of large numbers for belief functions, a thorough discussion of independence, a detailed analysis of Dempster’s rule, and an in-depth discussion of the interpretation of belief functions. For the forensic and legal applications in the current paper, such an in-depth mathematical study is not necessary, but we do note the fact that the subject is very interesting from both a theoretical and an applied perspective, and we plan to develop it much more in the future.

The current paper is organized as follows. In Section 2 we discuss the basic theory by introducing belief functions and conditioning. Next we apply the theory in Section 3 to the classical island problems, and in Section 4 to the problem of parental identification. In Section 5 we explain our analogue of Bayes’ rule, and finally in Section 6 we conclude by discussing some consequences of our theory for legal and forensic casework.

## 2 The basic theory

Let be a finite outcome space, for instance the members of a certain population. We want to make statements about the elements of in the presence of uncertainty, like saying something about who is the culprit in a certain crime. The classical way to do this is by means of a suitable probability distribution on . A probability distribution assigns a non-negative support to each element in such a way that the total support is equal to 1. We may, for instance, express our uncertainty about who is the culprit by means of such a probability distribution. The probability that the culprit can be found in a subset of is then equal to

 P(A):=∑ω∈Ap(ω). (2.1)

The probability measure can be interpreted as subjective, frequentistic or otherwise, depending on the context and personal taste. The support represents the probability or confidence in the outcome , and represents our probability or confidence in an outcome which is contained in . In classical probability theory, a subset of is also called an event or sometimes also a hypothesis, we make no distinction between the two phrases. The probability measure describes the probability of all such events or hypotheses.

Next we define basic belief assignments and belief functions. The difference between a basic belief assignment and a probability distribution, is that the former assigns support to nonempty subsets of rather than to individual outcomes. We write for the collection of all subsets of .

###### Definition 2.1.

A function is a basic belief assignment if and

 ∑C⊆Ωm(C)=1. (2.2)

Whereas represents the probability or confidence in the outcome , represents our confidence in an outcome in which is not specified further. It may appear that there is not much difference between and , but in fact there is. The crucial difference between and is that the support of a subset of is not immediately related to the support of the elements or subsets of . For instance, if we have no clue whatsoever about the outcome, that is, if we have no information at all, then we may express this by putting and for all strict subsets of . Or we may take and simultaneously , for , meaning that we have evidence for the union of and , but no further information to distinguish between them.

It is also possible that a basic belief functions only assigns positive support to singletons. In such a case, we are back in the classical situation. The quantity is sometimes referred to as the evidential support of . We should view as the analogue of in the classical description above.

We next define the analogue of , which is called a belief function. We want to quantify how much belief we can assign to a subset of . To this end, we consider all sets in with , which are precisely the events whose occurrence implies the occurrence of . The belief in a set now is the sum of the support of all subsets of . In terms of evidence, the belief in is the total evidential support of everything implying .

###### Definition 2.2.

Given a basic belief assignment , the corresponding belief function is defined by

 Bel(A):=∑C⊆Am(C). (2.3)

We next discuss a number of examples which should convince the reader that in many situations there are natural belief functions which adequately describe the situation.

###### Example 2.3.

Suppose we want to state our beliefs about a suspect being guilty or innocent, so guilty, innocent is our outcome space. If the evidential support of the the suspect being innocent is , and we have no further information, then we have innocent, guilty and . The support of assigned to should be interpreted as the amount of ignorance. Notice that the belief that the suspect is guilty is not equal to 1 minus the belief that the suspect is innocent. The corresponding belief function is given by guilty, innocent and .

###### Example 2.4.

The function for which and for all other is a basic belief assignment. The corresponding belief function assigns belief 1 to and belief zero to all strict subsets of . This belief function expresses total ignorance, except for the fact that the outcome must be in . As such it addresses the problem noticed in Example 1.2.

###### Example 2.5.

(Probability distributions) Every probability distribution is a belief function, as we already indicated. To see this, let be a probability distribution. Set for all and for all such that . Then we get

 Bel(A)=∑a∈Am({a})=∑a∈AP({a})=P(A) (2.4)

for every . Probability distributions are belief functions for which the corresponding basic belief assignment only assigns positive support to singletons. If for some with , then is not a probability distribution because it not additive: for any nonempty, disjoint such that we find

 Bel(A∪B)>Bel(A)+Bel(B). (2.5)

In concrete situations, the basic belief assignment can very well be based on classical probabilistic considerations when it is reasonable to do so. Here is an important example which we will discuss in full detail in Section 3.

###### Example 2.6.

(The island problem) Let be the population of the island. At the scene of the crime a DNA profile is found which we know has frequency in the population. This means that a randomly chosen person has probability to have the characteristic, independent of the other individuals. We remark that this assumption is in the realm of classical probability theory. This is reasonable since the frequency interpretation of classical probability works well within the context of DNA profiles.

Our basic belief assignment should capture our prior knowledge, that is, prior to the fact that we found the DNA profile at the crime scene. We have prior knowledge about two different things: (1) we know that we have selected uniformly at random from the population, and (2) we know the population frequency of the DNA profile to be . Both these items can be satisfactorily described with a classical probability distribution, and with classical independence assumptions. This leads to the following basic belief assignment, prior to the evidence.

We set111We write for the collection of triples with and . and let and be projections on respectively the first, second and -th coordinate. represents the criminal, the selected individual, and indicates that the -th individual has characteristic . Without any reference to the crime, but with reference to the particular characteristic , we can model the characteristics and the choice of by defining the following basic belief assignment on . Let be such that . Then for ,

 m(C∈X,S=x,Γ1=y1,…,ΓN+1=yN+1)=1N+1pk(1−p)N+1−k, (2.6)

for all , and for all other subsets of . We will typically write for the set in (2.6) and similar ones later on, and not explicitly mention the first coordinate.

This basic belief assignment expresses the facts that characteristics are random and independent with success probability and that is chosen uniformly on . But also, and very importantly, it says nothing about the crime and nothing about the identity of the criminal . For instance, for all , we assign zero belief to the event that .

In the theory we are about to develop, classical probability distributions are replaced by the more general belief functions, allowing for more flexibility. In Examples 2.4 and 2.6 above, the basic belief assignments reflect prior knowledge, or the absence thereof. As such it is reasonable to call the basic belief assignments and the corresponding belief functions our priors in these examples. These priors play the same role as the prior probabilities in the classical situation, with the crucial difference that they need not be a classical probability distribution anymore. As such they need not be additive and they can be genuinely uninformative if that is what corresponds to reality.

### 2.1 Conditioning

The next item on the agenda is to investigate how belief functions change when additional or new information is provided. This is akin to the classical situation in which a prior probability is updated into a posterior one as we briefly mentioned in the introduction. We explain how this works in our setting.

The rule we propose for conditioning is described as follows. Suppose we have a basic belief assignment and corresponding belief function . We want to condition on an event . The evidential support of now becomes evidential support of if is consistent with in the sense that . If , then the new evidential support of becomes zero. Next we rescale the support in such a way that the support again sums up to . This can of course only be done if the belief in is not . This leads to the following definition.

###### Definition 2.7.

Let be a basic belief assignment and the corresponding belief function. For such that we define the conditional basic belief assignment by

 mH(A):=∑B∩H=Am(B)1−∑B∩H=∅m(B), (2.7)

for and . The corresponding conditional belief function is defined as

 BelH(A)=∑B⊆AmH(B). (2.8)

The notion of conditioning in (2.7) is known as Dempster-Shafer conditioning. Shafer  derives the formula as a special case of Dempster’s rule of combination. We, however, see Definition 2.7 as standing on its own, motivated by the description above.

In the special case that is a probability distribution, our notion of conditional belief coincides with the notion of conditional probability. Indeed, in that case we can write

 BelH(A)=∑ω∈A∩Hm({ω})∑ω∈Hm({ω})=P(A|H), (2.9)

for every and such that .

###### Example 2.8.

Suppose we have a case in which the suspects are two parents and their son, so . We have a lot of evidence that points to the parents, none of which points to one of them in particular. Further, we have some evidence that points to the son. The corresponding basic belief assignment is, say, and . Under the hypothesis that it is a man, i.e. , the evidence pointing to the parents counts as evidence pointing to the father, so

 mH({Father})=910. (2.10)

## 3 The island problems

Now that we have established and discussed the main theoretical issues, we move on to our first important examples, namely the well known and classical island problems. The context of the island problems is classical. A crime has been committed on an island with inhabitants, so that we can be sure that one of them is the culprit. Now some characteristic of the criminal, e.g. a DNA profile, is found at the scene of the crime and we may assume that this profile originates from the culprit. Then we somehow select an individual from the island, who happens to have the same characteristic as the criminal. The question is what we can say about the probability or belief in the event that is in fact the criminal. This is not a well defined question yet, as it depends on the way was found. We distinguish between two cases: the cold case in which we randomly select an inhabitant, and the search variant in which we consider the inhabitants one by one in a random order, until we find an inhabitant with the characteristic found at the crime scene.

With the island problems, our belief functions allow us to assign a zero prior belief in the guilt of any of the individuals of the island, while at the same time assigning belief one to the full populations. This seems better suited to the legal context than the classical Bayesian setting since assigning a non-zero prior probability to an individual, without any evidence against the individual itself other than belonging to the population, seems unreasonable. Of course, we have to make modeling assumptions, as in the classical case and we will discuss these below. But as we shall see, the outcomes are different from the classical outcome: if we assume total prior ignorance the belief that is the culprit is in our setting different from the classical probability that he is guilty under a uniform prior. We turn to the examples now.

### 3.1 The cold case

We continue Example 2.6. We write for the set . Now we want to incorporate the crime and condition on the event that was chosen, and that the criminal and the chosen both have characteristic . According to our theory of conditioning, we have to assign the mass originally assigned to in (2.6) to the intersection of this set with , and normalize suitably. Clearly, this intersection is non-empty only if and , and in this case the intersection can be written as

 {C∈{i:yi=1},S=s;Γ1=y1,…,ΓN+1=yN+1}. (3.1)

Next we need to find the correct normalization. We claim that the total mass of sets of the form in (2.6) which have non-empty intersection with is given by

 1N+1N∑j=0(Nj)pj+1(1−p)N−j=pN+1. (3.2)

Note that one can obtain the outcome at the right hand side by either computing the sum, or by simply noticing that for the intersection to be non-empty it is necessary and sufficient that and that . Since (2.6) describes a classical probabilistic experiment, the total mass of the basic belief assignment that has non-empty intersection with , is just the probability that in this classical experiment, and . This happens with probability .

It follows that for such that and , that

 mHs(A(s,y1,…,yN+1)∩Hs) = pk+1(1−p)N−kN+1⋅N+1p = pk(1−p)N−k.

Summarizing, we have that for such that and ,

 mHs(C∈{i:yi=1},S=s;Γ1=y1,…,ΓN+1=yN+1)=pk(1−p)N−k. (3.3)

Another way of writing this is as follows: for any such that and , we have, writing for ,

 mHs(C∈A,S=s,Γ1=1A(1),…,ΓN+1=1A(N+1))=pk(1−p)N−k. (3.4)

This is the new basic belief assignment after conditioning on and can be called the posterior in this case.

We can now use this result to compute belief in certain events, but before we do that, we would like to discuss the chosen modeling. We defined the basic belief assignment without using that we know that someone in the population has , namely the criminal ; this information is only added once we condition on . Perhaps some readers might find this somewhat counterintuitive. Why not use the information that there is at least one individual in which has in the definition of the basic belief assignment? We now explain how this can be done, and show that it leads to the same belief assignment after proper conditioning.

Let be such that , . We define the basic belief assignment as follows:

 m′(C∈{i:yi=1},S=x , Γ1=y1,…,ΓN+1=yN+1)= = 1N+1pk+1(1−p)N−k1−(1−p)N+1.

This belief assignment gives us our belief in the characteristics conditioned on the event that at least one is equal to 1. It expresses no knowledge about the identity of other than that we know that has . We denote the event in (3.1) by .

Next we condition on the event . As before, can only be nonempty if and . In that case is exactly the event in (3.1) and the correct normalization is

 1N+1N∑j=0(Nj)pj+1(1−p)N−j1−(1−p)N+1=pN+1⋅11−(1−p)N+1. (3.6)

It now follows that after conditioning we obtain the same belief assignment as before, since the extra term appears in both the numerator and the denominator. We conclude that the two approaches lead to the same result, as they should.

Now that we have computed the new basic belief assignment, we can compute our belief in certain events, most notably our belief in the event that for with . For this event we have:222We write for the number of elements in the set .

 BelHs(C∈B) = ∑E⊆{C∈B}mHs(E) (3.7) = ∑A⊆B|s∈AmHs(C∈A;S=s;Γ1=1A(1)…; …;ΓN+1=1A(N+1)) = |B|−1∑k=0(|B|−1k)pk(1−p)N−k = (1−p)N+1−|B|.

An interesting special case occurs when . The conditional belief that is apparently given by

 BelHs(C=s)=(1−p)N, (3.8)

simply take . This formula has a simple interpretation: the belief that is the criminal is just the probability that all other members of the population are excluded since they have the wrong profile.

It is interesting to compare this answer to the classical one, in which a uniform prior is taken. In the classical case, the posterior probability that

is equal to

 11+Np, (3.9)

see e.g.  or . We observe that

 11+Np=N−1∏k=01+kp1+(k+1)p>(11+p)N>(1−p21+p)N=(1−p)N. (3.10)

Hence in our setting, the belief that is always smaller than in the classical case, something we can intuitively understand by recalling that we assign prior belief zero to this event. To give some indication of the difference between the two answers, if (for , then (3.9) , while (3.8) .

Since belief functions generalize probability distributions, we should be able to re-derive the classical result (3.9) using our approach, and we now show that this is indeed the case. If we want to take a uniform prior for the criminal, then the basic belief assignment, denoted by , is as follows. Let be such that . Then our prior is given by

 mc(C=t,S=x,Γ1=y1,…,ΓN+1=yN+1)=1(N+1)2pk(1−p)N+1−k, (3.11)

for all , and for all other subsets of . Note that the corresponding belief function is a probability distribution, since only singletons have positive basic belief. Next we condition on the same event as before. The intersection of the set in (3.3) with is only non-empty if and . The probability that and is as before. Given this, the probability that also is 1 if and if . Hence the intersection is non-empty with probability

 pN+1(1N+1+NN+1p)=p(1+Np)(N+1)2.

We can now compute the conditional belief assignment but we note that we only need (with and )

 mcHs(s,s,y1,…,yN+1) = pk+1(1−p)N−k(N+1)2(N+1)2p(1+Np = pk(1−p)N−k1+Np.

Summing over all and using Newton’s binomium, we see that the conditional belief that is given by (3.9), as required.

This example illustrates that we lose nothing by working with belief functions, and that belief functions only add flexibility. If a certain classical prior is reasonable then we can take that prior and work with it. If there are reasons to have a non-classical prior, for instance complete ignorance within a given population, then belief functions are flexible enough to deal with this.

### 3.2 The cold case generalized

So far we have assumed that we can describe the realization of the characteristic with a classical probability distribution with independence between different individuals. The flexibility that is possible with belief functions was only used in order to model complete ignorance about the culprit.

But it is possible to be even more flexible with our knowledge of the characteristics, and use belief functions also for them. This entails a certain uncertainty about the occurrence of the characteristic . We write for the probability that we determine that an individual has the property. We write for the probability that we determine that an individual does not have the property and write for the probability that we can not determine if an individual has the property. This then leads to the following prior basic belief assignment on as the analogue of (2.6):

 m(S=x,Γ1∈Y1,…,ΓN+1∈YN+1)=1N+1pk1qk0rN+1−k0−k1, (3.12)

where and and .

From here on, the analysis is more or less as before. We denote the set in (3.12) by . Again we want to condition on . The intersection is nonempty if and only if and and can in that case be written as

 {C∈{i:Yi≠{0}},S=s,Γ1∈Y1,…,Γn∈YN+1}. (3.13)

The total mass of these sets is

 pN+1, (3.14)

for the same reason as before. Hence we have the posterior

 mHs(A(s,Y1,…,YN+1)∩Hs) = pkqlrN+1−k−lN+1⋅N+1p = pk−1qlrN+1−k−l.

For we find

 BelHs(C∈B)=qN+1−|B|. (3.15)

When this is just (3.8) as it should. It is noteworthy that the belief in only depends on and , and not on and . This is to be expected: the belief that is the criminal is the probability that we know for sure that all other members of the population do not have the correct profile.

### 3.3 The search case

In the search variant, we do not choose a random individual but we check the inhabitants one by one in a random order, until an individual with the relevant characteristic is found. However, we only take into account the result of the search and not any information about the search itself. As a consequence, the search case boils down to picking a random individual from the subset of the population that has the characteristic. As in the cold case, there are (at least) two ways to approach the situation. In the first approach, we do not immediately link the characteristic to the crime, and then it is a priori not certain that an individual with the characteristic found at the scene of the crime exists in the population. In the second approach, we condition the distribution of the characteristics on the fact that we know at least one individual has it.

For the first approach, the space must be rich enough to accommodate the possibility that no one has the characteristic. To this end, we set and let and be projections on respectively the first, second and -th coordinate. If no one has the characteristic we set to encode that we could not find a suspect (which is only the case if ). If at least one individual has the characteristic, it is clear that is uniformly selected from the subset of individuals that have the trait. Let . For such that , , and we set our prior basic belief assignment as follows:

 m(S=x,Γ1=y1,…,ΓN+1=yN+1)=1kpk(1−p)N+1−k (3.16)

and

 m(S=∗,Γ1=Γ2=…=ΓN+1=0)=(1−p)N+1.

Note that there are two differences compared to the cold case: we have to assume that , and we have to divide by rather than by .

Next we link the characteristic to the crime and we condition on . Note that the conditioning does not contain information about the length of the search or the identity of searched individuals. We only know that was the first one to be found with the characteristic. We have, for such that ,

 {S=s;Γ1=y1;…;ΓN+1=yN+1}∩Hs = {C∈{i:yi=1},S=s;Γ1=y1,…,ΓN+1=yN+1}.

Note that the sets in (3.3) are the only subsets of with positive mass that have a nonempty intersection with . Hence the normalization follows from the total mass of such sets, which is equal to

 Ms :=N∑j=0(Nj)1j+1pj+1(1−p)N−j (3.18) =N∑j=01N+1(N+1j+1)pj+1(1−p)N−j =1−(1−p)N+1N+1.

Notice that we can also derive (3.18) by observing that does not depend on and thus plus adds up to .

Hence for such that and we have the posterior

 mHs(C∈A,S=s,Γ1=1A(1);…,ΓN+1=1A(N+1)) (3.19) = pk+1(1−p)N−kk+1N+11−(1−p)N+1.

Now we can compute our belief in certain events. For instance, our belief in is given by

 BelHs(C=s) = mHs(C=s,S=s,Γs=1,∩i≠s{Γi=0}) (3.20) = p(1−p)N(N+1)1−(1−p)N+1,

which is different compared to the cold case. There is a very natural interpretation of the expression in (3.20

). In the numerator, we have the probability that a binomially distributed random variable with parameters

and is equal to 1. The denominator is the probability that this random variable is positive, so (3.20) is the conditional probability for such a random variable to be 1 given it is positive. This makes sense, since we can only know for sure that when is the only one with the characteristic. Notice that (3.20) equals zero if and if we can rewrite (3.20) as

 N+1∑Nk=0(1−p)−k. (3.21)

As already mentioned above, there is the alternative approach in which we deduce from the characteristic found at the crime scene that at least one individual has the characteristic. In this case we do not need the extended since has probability zero. For such that and , , the basic belief assignment is now given by

 m′(C∈{i:yi=1},S=x , Γ1=y1,…,ΓN+1=yN+1)= = 1k+1pk+1(1−p)N−k1−(1−p)N+1.

After this, we condition on . It is easily seen that the normalizing factor is just , and this immediately leads to the same formula as in (3.20).

In the classical case, starting with a uniform probability distribution, the posterior probability that is equal to

 1−(1−p)N+1(N+1)p=1N+1N∑j=0(1−p)j, (3.22)

see e.g. . Note that (3.22) is the arithmetic mean of , while (3.21

) is the harmonic mean of the same sequence. Hence the answer using our approach is - as it was in the cold case - smaller than the classical answer.

We briefly demonstrate that we can also derive this classical result with our technology. In the classical case, the basic belief assignment is given by

 mc(C=c,S=x,Γ1=y1,…,ΓN+1=yN=1)=1k1N+1pk(1−p)N+1−k, (3.23)

whenever , and

 mc(S=∗,Γ1=Γ2=⋯=ΓN+1=0)=(1−p)N+1. (3.24)

Note that the corresponding belief function is a probability measure, since only singletons have positive basic belief assignments. We condition on as usual. To compute the (classical) belief in conditioned on we need to compute the conditional basic belief assignment , for which we need the correct normalizing constant and the total mass of sets in (3.23) whose intersection with is not empty. Elementary combinatorics gives that the latter is equal to

 N∑k=0(Nk)1k+11N+1pk+1(1−p)N−k=1−(1−p)N+1(N+1)2. (3.25)

The normalizing constant follows from the total mass assigned to sets whose intersection with is non-empty. For this intersection to be non-empty, we need that , and . In the classical probabilistic experiment described by , we simply need to compute the probability that occurs. We need subsequently (1) , and (2) is chosen (implying that ). The first step occurs with probability . Given this, we now know that not all labels are 0, and every individual has the same probability to be chosen. Hence, the conditional probability of Step (2) given Step (1) is simply . It follows that the correct normalizing constant is . Combining this with (3.25) yields (3.22).

## 4 Parental identification

Our next example concerns the situation in which we have a known mother and a known child, but we do not know who the father is. We assume that there is a set of potential fathers. We would like to make belief statements about the possible father-ship of someone chosen from the population, based on the DNA profile of this chosen person. In order to keep things as simple as possible, we assume that we only consider one specific locus of the DNA. Furthermore, we assume that the alleles of mother and child at that locus are such that we know what the paternal allele must be. Every potential father in has two alleles at the locus, and we denote by the number of ‘correct’ alleles matching the paternal allele, hence

In order to set up our prior belief function, we set and let , and be projections on respectively the first, second, third and -th coordinate. represents the father, and the selected individual which is the putative father. The indicator is associated with the putative father, and is only relevant if the putative father has exactly one ‘correct’ allele; in that case means that he passes on this correct allele to his child, while indicates that he does not do that. Let and be the probabilities that an individual has respectively alleles of the right type. We assume that we know these probabilities from population surveys. Set for . Then the prior basic belief assignment is given by

 m(S=x,A=a,Γ1=y1,…,ΓN+1=yN+1)=121N+1pk00pk11pk22. (4.1)

As before, this basic belief assignment is a summary of what we know, and is formulated in terms of items that can be well described by classical probabilities. The factor comes from the fact that a father passes a randomly chosen allele to his child. Note that the basic belief assignment in (4.1) only assigns positive basic belief to sets that contain no information whatsoever about the father, other than the fact that he belongs to the given population. We denote the event in (4.1) by .

Now we have to distinguish between two scenarios: the suspect in question has one or two alleles of the right type. We look at the case he has two such alleles first. This means we want to condition on

 Hs,2={S=s,ΓS=2,{ΓF=2}∪{ΓF=1,A=1}}. (4.2)

We have to assign the mass originally assigned to to the intersection of this set with , and normalize suitably. This intersection is non-empty precisely when and , and as before, the total mass of this can be calculated by viewing the basic belief assignment in (4.1) as a classical probabilistic experiment, leading to a normalizing constant of

 p2N+1. (4.3)

It follows that if we have the posterior

 mHs,2(E(s,a,y1,…,yN+1)∩Hs,2) = 121N+1pk00pk11pk22⋅N+1p2 = 12pk00pk11pk2−12.

This basic belief assignment leads to the following posterior belief function. Let be such that . If , then we are back in the classical cold case of the island problem, with replaced by , since one correct allele will be enough now:

 BelHs,2(F∈B,A=1) (4.4) =12|B|−1∑k0=0(|B|−1k0)pk00|B|−1−k0∑k1=0(|B|−1−k0k1)pk11pN−k1−k02 =12|B|−1∑k0=0(|B|−1k0)pk00(p1+p2)N−k0 =12pN+1−|B|0.

If then the situation reduces to the cold case of the island problem with , since a potential father now needs two alleles. By an analogous computation we obtain:

 BelHs,2(F∈B,A=0)=12(p0+p1)N+1−|B|. (4.5)

Hence

 BelHs,2(F∈B) =BelHs,2(F∈B,A=1)+BelHs,2(F∈B,A=0) (4.6) =12(pN+1−|B|0+(p0+p1)N+1−|B|).

In the special case that , we find

 BelHs,2(F=s)=12pN0+12(p0+p1)N, (4.7)

Next we consider the case in which we condition on

 Hs,1={S=s,ΓS=1,{ΓF=2}∪{ΓF=1;A=1}}. (4.8)

Now can only be non-empty if and , which happens with probability