Conservative, Proportional and Optimistic Contextual Discounting in the Belief Functions Theory

12/19/2013 ∙ by Marek Kurdej, et al. ∙ Université de Technologie de Compiègne 0

Information discounting plays an important role in the theory of belief functions and, generally, in information fusion. Nevertheless, neither classical uniform discounting nor contextual cannot model certain use cases, notably temporal discounting. In this article, new contextual discounting schemes, conservative, proportional and optimistic, are proposed. Some properties of these discounting operations are examined. Classical discounting is shown to be a special case of these schemes. Two motivating cases are discussed: modelling of source reliability and application to temporal discounting.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In many problems of information fusion, there is a need to allow for the reliability of a source [1]. The meta-knowledge about the reliability can be only source-dependent, but it can as well vary for different types of evidence. While the first case is easily handled by classical discounting operation [2], the second one is more complex and existing solutions do not meet all possible use cases [3, 4]. In this article, we address this problem in the context of the theory of belief functions, also known as Dempster–Shafer theory [5, 2] by proposing three schemes for contextual discounting: conservative, proportional and optimistic.

The domain of information fusion concerns in great measure the combination of sensor data arriving successively with the passage of time. Past information is often useful and should not be discarded. However, one cannot disregard the fact that the information may worth less and less over time. In order to handle this variation in subjective value of a piece of information, we apply proposed discounting operations to temporal discounting.

The closest work and the starting point for this article has been realised by Mercier et al. who presented the original idea of contextual discounting [6, 3]. This research has been further developed and generalised contextual discounting and reinforcement have been described as examples of correction mechanisms for belief functions [4, 7]. Pichon et al. devoted some research to the subject of information correction schemes by proposing a strategy taking into account the source’s relevance and truthfulness [8]. Other mechanisms of data revision have been studied in the context of the evidence theory. A review of existing revision rules can be found in [9], along with an extension of one of them able to cope with inconsistency between prior and input information.

The rest of this paper is organised as follows. The existing concepts of discounting in the theory of belief functions will first be recalled in Section II. Next, Section III will present the details of the proposed schemes. Rules’ behaviour and their properties will be described in Section IV, while some simple examples will be given in Section V. A case study about the application of the proposed method to temporal discounting will be the subject of Section VI. We will conclude the paper and outline the perspectives for future research in Section VII.

Ii Belief functions theory

Ii-a Fundamentals

The information obtained from source concerning the actual value taken by variable is quantitatively described by basic belief assignment (bba) . Variable takes values in a finite set which is called frame of discernment (fod). is defined as a function from to interval satisfying the condition:

(1)

The notation will be further simplified to or when no ambiguity is possible. Total ignorance about the variable is represented by a vacuous bba for which . Additionally, a mass function satisfying will be called normal or regular, whereas one not fulfilling this condition — subnormal.

In following sections, the disjunctive rule of combination (DRC) will be used. DRC may be used to combine two distinct pieces of evidence , under the assumption that at least one of the two information sources is reliable [10]. DRC is defined by:

(2)

A basic belief assignment can be expressed not only by mass function , but there are equivalent functions representing the same information. One of them is belief function which in the Transferable Belief Model (TBM) [11] take the form of:

(3)

Ii-B Classical discounting

The most commonly used form of discounting operation given discount factor has been proposed by Shafer in [2, pp. 251–255] and will be subsequently called classical discounting:

(4)

which can be expressed equivalently using mass functions as:

(5)
(6)

Ii-C Contextual discounting

Contextual discounting, an extension of classical discounting taking into account reliabilities varying between classes has been proposed by Mercier et. al. [6] and developed in [3, 4, 7]

. This operation uses vector

of discount factors attributed to elements of partition of the frame of discernment , i.e.:

(7)
(8)
(9)

Contextual discounting of a bba is equal to:

(10)
(11)

where each , , is defined by:

(12)

One of the inconveniences of this method is the fact that reliability factors are attributed to a partition of the frame of discernment, which excludes cases where reliability is known for intersecting subsets of .

Ii-D Generalised contextual discounting

The aforementioned problem has been addressed in [4, 7] where generalised contextual discounting is proposed as a correction mechanism. Again, vector of discount factors is used, but here, they can be defined also for intersecting sets. The method employs canonical disjunctive decomposition of a subnormal bba introduced by Denœux [12]. The idea is to discount disjunctive weights of such a decomposition of bba :

(13)

where is a negative generalised simple bba (NGSBBA) [12] defined from to by:

(14)
(15)
(16)

Iii Conservative, optimistic and proportional discounting

As a departure point for the design of an operation of discounting, a few hypotheses have been set. First and foremost, information source is supposed to excessively encourage set of solutions and, therefore, should be discounted by factor corresponding to . The behaviour of the new discounting operation should be close to the behaviour of classical discounting. Mass of conflict shall get discounted and new schemes should generalise the classical one. Moreover, setting a non-zero discount factor for set should entail the discounting of mass attributed to , whereas masses of sets having no elements in common with should rest unchanged111Except for frame of discernment , since masses are transferred to this set.. Such a behaviour is opposite to contextual discounting proposed by Mercier et al. [6] that retains mass attributed to and discounts other sets, which we judge counter-intuitive especially in case of many classes, but well-justified and conform to the proposed interpretation (see [3, Example 2]). Finally, we postulate that discounted mass of set should be transferred to and not to its other superset being a proper subset of , since doing so would imply additional knowledge about the state of the represented entity.

Iii-a Notation

In the following sections, we will stick to similar notation as in Section II. In order to distinguish proposed discounting operations between them and to avoid any confusion with existing schemes, will denote conservative discounting of a bba using discount rate vector defined for all elements of . Similarly, will represent proportional discounting and — optimistic discounting.

When the set of classes for which discount factors are defined is obvious or unimportant, notation will be equivalent to . Analogical convention will be used for other types of discounting. Equally, we simplify the notation by omitting the type of discounting ( for conservative, for proportional or for optimistic) if an equation is valid for all types. Finally, set will be denoted and will refer to the discount rate defined for set , given that , .

Iii-B Conservative discounting

Conservative discounting presents a pessimistic approach to the discounting. As stated before, the attribution of by source is excessive and this mass should be discounted by . Let us suppose now that some meta-knowledge states additionally that the affectation of masses to supersets of by source is highly dependent on class . Bearing in mind the above statement, the mass attributed to should be discounted in the same manner as . Therefore, in conservative discounting, set , the empty set and all sets having at least one element in common with are discounted by the same factor .

Generalising this behaviour to any , one obtains:

(17)
(18)
(19)

One remarks that the most discounted mass is which is affected by all discount rates.

Iii-C Optimistic discounting

Optimistic discounting is based on a hypothesis opposite to the one made in conservative discounting. This time, the meta-information about source asserts that masses of supersets of are affected independently of class . These masses shall not be discounted by . On the other hand, all subsets of will be affected in the same way as .

This type of discounting can be expressed for any by:

(20)
(21)
(22)

Iii-D Proportional discounting

The above proposed schemes represent two extremes of discounting strategies. Conservative one that demonstrates very cautious or even overcautious behaviour which can be resumed as: in case of doubt, do not exclude any possibilities. Indeed, discounting all supersets in the same way as the set in question means that one accepts a possibility that mass of a superset (e.g. ) corresponds entirely to one of its constituents (e.g. ), which, incidentally, has been overestimated and should hence be discounted. Conversely, when one assumes that mass of superset depends on a set that has not been excessively evaluated (), optimistic discounting is used. Such a behaviour can be seen as optimistic or bold, because any doubt about whether to discount a particular set or not implies a negative answer.

Since the above schemes are the extreme cases, a need of an in-between solution appears naturally. A manner of performing this without recurring to mass-dependent computation is to ponder the discount rate by some measure of dependence between a set and it supersets. The straightforward one is the inclusion criterion measuring the ratio between cardinalities of the set and the superset. On the basis of this idea, proportional discounting is expressed by:

(23)
(24)
(25)

Iv Properties

Iv-a Generalisation of classical discounting

Proposed discounting schemes generalise classical discounting in the case where . Such a behaviour comes simply from the fact that for any , all its subsets will get discounted. Since all sets are subsets of , all of them are affected in the same way (except for itself as expected).

Iv-B Order invariance

The result of the discounting operations over different classes is invariant to the order of these operations, equally for conservative, optimistic and for proportional discounting. The proof is omitted here, as it is trivial and is based on the commutative property of the multiplication.

(26)

Iv-C Operation grouping

A
Table I: Comparative table of the proposed discounting methods. For succinctness, .
Mass attributed to omitted for clarity, since for all mass functions .

For all the proposed schemes, the result of two discounting operations on sets , and discount rate vectors , done one after another is equal to a single discounting operation on combined discount rate vector .

(27)

This property can be easily generalised for any number of discounting operations.

(28)

given that

(29)
(30)

and under the following condition:

(31)

V Examples

V-a Example 1: comparison

Let and let be a bba defined on . Table I presents the result which yield the proposed discounting schemes with and discount rate vector 222The fact that represents a partition of is insignificant, since it could be any subset of .. For clarity, we use . It is noteworthy that we can arrange the proposed discounting operations in incrementing order of total discounted mass: optimistic proportional conservative. For all mass functions and all discount rate vectors, the following equation holds:

(32)

V-B Example 2: source reliability modelling

Let us consider an example of a simplified aerial target recognition problem borrowed from [13, 3]. The frame of discernment contains three classes: air-plane (), helicopter () and rocket (). Sensor provides us with a bba 

hesitating between classifying the target as an air-plane or a rocket:

(33)

Let us now consider that the sensor is over-reliable when the source is a helicopter or a rocket with plausibility , while being reliable when the target is an air-plane. The conservatively discounted bba is:

(34)

It is to remark that a fraction of the mass attributed to has been transferred to , which can be interpreted as follows: if the target is a helicopter or a rocket, then the source is over-reliable and it might have quantified excessively its belief about target being a helicopter, a rocket or any of the two. Thus, the target reported as a rocket may in reality be of another type.

For completeness, the optimistically discounted bba and the proportionally discounted bba are:

(35)
(36)

A similar example, with , using contextual discounting cited from Mercier [3, Example 2, Case 1] gives:

(37)

This shows that the behaviour is almost inverse to conservative and proportional discounting and different than optimistic discounting. Namely, the discount factor being set to the same value but attributed to the compliment set, the resulting mass function is identical.

Vi Case study: temporal discounting

In this section, an application to temporal discounting is studied. The principal idea behind this discounting is the fact that a piece of information becomes partially obsolete with time. This can happen because the entity described by this particular information is dynamic, changes or is not observed any more. It is important to underline that different pieces of information become obsolete at possibly different rates. This example motivates why there is a need for introducing new contextual discounting schemes and why the existing one is not sufficient. The first part demonstrates some postulates about temporal discounting itself. Next, the existing contextual discounting scheme is applied to temporal discounting. Finally, the application of the proposed methods is demonstrated.

Vi-a Postulates

The below stated postulates imply that the temporal discounting should be subject to exponential decay, similarly to the process of radioactive decay described by Ernest Rutherford in early 1900’s [14]. Indeed, we opt for the solution where the information “decays”, i.e. a piece of information becomes gradually obsolete.

In the following paragraphs, will denote a set, , about the reliability of which an additional piece of knowledge is available.

(38)
(39)
(40)

Vi-A1 Half-life time

The mass attributed to a piece of information is two times smaller than the initial mass after half-life time . Thanks to this postulate, one can compare the persistence of different information types by comparing their half-life times. As far as different information persistence measures are considered, it is noteworthy that choosing “life expectancy” (mean time after which a piece of information becomes completely irrelevant) would prohibit the use of exponential functions and so entail some complications.

(41)

More generally, -life time : the mass attributed to a piece of information represents one-th of the initial mass after time .

(42)

Vi-A2 Order invariance

The result of discounting is independent of the order of operations.

(43)

Vi-A3 Only age-dependent

The discounted mass value depends only on the age of the information and does not on the number of discounting operations. Indeed, it is desirable that the frequency at which a piece of information gets discounted, does not change the final result.

(44)

Vi-B Temporal discounting using contextual discounting

This section will present an attempt to use contextual discounting as presented by Mercier [6, 3] and a counter-example demonstrating that this discounting scheme is not adapted for this aim.

Vi-B1 -discounting

As presented in Section II-C, having defined partition of the frame of discernment and discounting rate vector for all elements of , discounted mass function is computed as follows:

(45)

Vi-B2 Direct computation of discounting mass function

Instead of calculating discounting mass function by applying the disjunctive operator, one can compute it directly using [6, Proposition 7]:

(46)

Vi-B3 Direct computation of discounted mass function

Once again, direct computation is possible to obtain discounted mass function using the results from Equations 10 and 46, which yields333It is supposed that no discount rate has been defined for the empty set.:

(47)

Vi-B4 Simplified computation of a discounted mass function

Let us suppose that is a normal mass function, which enables us to simplify Equation 47 for singletons to:

(48)
A
0 0 0 0
0.3
0.2
0.2
0.2
0 0 0 0
0 0 0 0
0.1 0.38499 0.4396775 0.49124
Table II: Temporal discounting using the proposed discount schemes. Case 1.
A
0 0 0 0 0
0.3 0.1723
0.2 0.1
0.2 0.1391
0.2 0.1662
0 0 0 0 0.15
0 0 0 0 0.074
0.1 0.53853 0.6316025 0.69596 0.1983
Table III: Temporal discounting using the proposed discount schemes. Case 2.
Result of Mercier’s contextual discounting in the rightmost column.

Vi-B5 Use for temporal discounting

In order to calculate discount rates of contextual discounting from parameters of temporal discounting, let us compare side by side temporal discounting (Equation 38) as obtained thanks to the above stated postulates:

(49)

with the simplified expression of contextually discounted mass (Equation 48):

(50)

which, given that , yields:

(51)
(52)
(53)

Let . Creating a system of equations for all using Equation 53 issues:

(54)

and by solving it, one obtains:

(55)

Note: by convention .

From Equations 49 and 55, we obtain:

(56)
(57)

Vi-B6 Example and counterexample

Let consider two cases  and  of a sensor  providing a mass function and . For each , a half-life time  is known:

(58)