1 Introduction and background
Agents operating in noisy and complex environments will receive evidence from a variety of different sources, many of which will be at least partially inconsistent. In this paper we investigate the interaction between two broad categories of evidence; direct evidence from the environment and evidence received from other agents with whom an agent is interacting or collaborating to perform a task. For example, robots engaged in a search and rescue mission will receive data directly from sensors as well as information from other robots in the team. Alternatively, software agents can have access to online data as well as sharing data with other agents.
The efficacy of combining these two types of evidence in multi-agent systems has been studied from a number of different perspectives. In social epistemology [Douven and Kelp2011] has argued that agent-to-agent communications has an important role to play in propagating locally held information widely across a population. For example, interaction between scientists facilitates the sharing of experimental evidence. Simulation results are then presented which show that a combination of direct evidence and agent interaction, within the Hegselmann-Krause opinion dynamics model [Hegselmann and Krause2002]
, results in faster convergence to the true state than updating based solely on direct evidence. A probabilistic model combining Bayesian updating and probability pooling of beliefs in an agent-based system has been proposed in[Lee et al.2018a]. In this context it is shown that combining updating and pooling leads to faster convergence and better consensus than Bayesian updating alone. An alternative methodology exploits three-valued logic to combine both types of evidence [Crosscombe and Lawry2016] and has been effectively applied to distributed decision-making in swarm robotics [Crosscombe et al.2017].
In this current study we exploit the capacity of Dempster-Shafer theory (DST) to fuse conflicting evidence, in order to investigate how direct evidence can be combined with a process of iterative belief aggregation in the context of the best-of- problem. The latter refers to a general class of problems in distributed decision-making [Parker and Zhang2009, Valentini et al.2017] in which a population of agents must collectively identify which of alternatives is the correct, or best, choice. These alternatives could correspond to physical locations as, for example, in a search and rescue scenario, different possible states of the world, or different decision-making or control strategies. Agents receive direct but limited feedback in the form of quality values associated with each choice, which then influence their beliefs when combined with those of other agents with whom they interact. It is not our intention to develop new operators in DST nor to study the axiomatic properties of particular operators at the local level (see [Dubois et al.2016] for an overview of such properties). Instead, our main contribution is a study of the macro-level convergence properties of several established operators when applied iteratively by a population of agents over long timescales and in conjunction with a process of evidential updating.
An outline of the remainder of the paper is as follows: In Section 2 we give a brief introduction to the relevant concepts from DST and summarise its previous application to dynamic belief revision in agent-based systems. Section 3 introduces a version of the best-of- problem exploiting DST measures and combination operators. In Section 4 we then give the fixed point analysis of a dynamical system employing DST operators so as to provide insight into the convergence properties of such systems. In Section 5 we present the results from a number of agent-based simulation experiments, carried out to investigate consensus formation in the best-of- problem under varying rates of evidence and levels of noise. Finally, Section 6 concludes with some discussion.
2 An Overview of Dempster-Shafer Theory
In this section we introduce relevant concepts from Dempster-Shafer theory (DST) [Dempster1967, Shafer1976], including four well-known belief combination operators. Given a set of states or frame of discernment , let denote the power set of . An agent’s belief is then defined by a basic probability assignment, or mass function , where and . The mass function then characterises a belief and a plausibility measure defined on such that for ;
and hence where .
A number of operators have been proposed in DST for combining or fusing mass functions [Smets2007]. In this paper we will compare in a dynamic multi-agent setting the following operators; Dempster’s rule of combination [Shafer1976], Dubois & Prade’s operator [Dubois and Prade2008], Yager’s operator [Yager1992], and a simple averaging operator. The first three operators all make the assumption of independence between the sources of the evidence to be combined but then employ different techniques for dealing with the resulting inconsistency. Dempster’s rule uniformly reallocates the mass associated with non-intersecting pairs of sets to the overlapping pairs, Dubois & Prade’s operator does not re-normalise in such cases but instead takes the union of the two sets, while Yager’s operator reallocates all inconsistent mass values to the universal set . The four operators were chosen based on several factors: the operators are well established and have been well studied, they require no additional information about individual agents, and they are computationally efficient at scale (within the limits of DST). In the following we will see that these different ways of dealing with inconsistency result in significant differences in the convergence properties of the operators when used iteratively in an agent-based system.
Let and be mass functions on . Then the combined mass function is a function such that for ;
Dubois & Prade’s Operator
In the agent-based model of the best-of- problem proposed in the following section, agents are required to make a choice as to which state they should investigate at any particular time. To this end we utilise the notion of pignistic distribution
proposed by Smets and Kennes Smets1994. For a given mass function, the associated pignistic distribution is a probability distribution onobtained by reallocating the mass associated with each set to the elements of that set as follows.
Given a mass function , the corresponding pignistic distribution on is given by;
DST has been applied to multi-agent dynamic belief revision in a number of ways. For example, [Wickramarathne et al.2014] and [Dabarera et al.2014] investigate belief revision where agents update their beliefs by taking a weighted combination of conditional belief values of other agents using Fagin-Halpern conditional belief measures. The latter are motivated by the probabilistic interpretation of Dempster-Shafer theory according to which a belief and plausibility measure are characterised by a set of probability distributions on .
Several studies have applied a three-valued version of DST in multi-agent simulations. This corresponds to the case in which there are two states, , one of which is associated with the truth value true, one with false and where the set is then taken as corresponding to a third truth state representing uncertain or borderline. One such approach based on subjective logic [Cho and Swami2014] employs the combination operator proposed in [Jøsang2002]. Another [Lu et al.2015] uses Dempster’s rule applied to combine an agent’s beliefs with an aggregate of those of her neighbours. Similarly, [Crosscombe and Lawry2016] uses the Dubois & Prade operator for evidence propagation.
With the exception of [Crosscombe and Lawry2016] in the case, none of the above studies considers the interaction between direct evidential updating and belief combination. The main contribution of this paper is therefore to provide a detailed and general study of DST applied to dynamic multi-agent systems in which there is both direct evidence from the environment and belief combination between agents with partially conflicting beliefs. In particular, we will investigate and compare the consensus formation properties of the four combination operators when applied to the best-of- problem. In the next section we propose a formulation of the best-of- problem with agent beliefs and evidence represented in DST.
3 The Best-of- Problem
Here we present a formulation of the best-of- problem within the DST framework. We take the choices to be the states . Each state is assumed to have an associated quality value which we take to be in the interval with and corresponding to minimal and maximal quality, respectively. Alternatively, we might interpret as quantifying the level of available evidence that corresponds to the true state of the world.
In the best-of- problem agents explore their environment and interact with each other with the aim of identifying which is the highest quality (or true) state. Agents sample states and receive evidence in the form of the quality, so that in the current context evidence regarding state takes the form of the following mass function;
Hence, is taken as quantifying both the evidence directly in favour of provided by , and also the evidence directly against any other state for . Given evidence an agent updates her belief by combining her current mass function with using a combination operator so as to obtain the new mass function given by .
A summary of the process by which an agent might obtain direct evidence in this model is then as follows: Based on her current mass function , an agent stochastically selects a state to investigate, according to the pignistic probability distribution for as given in Definition 2.2. More specifically, she will update to with probability for and leave her belief unchanged with probability , where
is a fixed evidence rate quantifying the probability of finding evidence about the state that she is currently investigating. In addition, we also allow for the possibility of noise in the evidential updating process. This is modelled by a random variableassociated with each quality value. In other words, in the presence of noise the evidence received by an agent has the form;
is a normally distributed random variable with mean222We normalise so that if then it is set to , and if then it is set to .. Overall, the process of updating from direct evidence is governed by the two parameters, and , quantifying the availability of evidence and the level of associated noise, respectively.
In addition to receiving direct evidence we also include belief combination between agents in this model. This is conducted in a pairwise symmetric manner in which two agents are selected at random to combine their beliefs, with both agents then adopting this combination as their new belief, i.e., if the two agents have beliefs and , respectively, then they both replace these with .
4 Fixed Point Analysis
Consider an agent model in which at each time step two agents are selected at random to combine their beliefs from a population of agents with beliefs quantified by mass functions . For any
the state of the system can be represented by a vector of mass functions. Without loss of generality we can assumed that the updated state is then . Hence, we then have a dynamical system characterised by the mapping;
The fixed points of this mapping are those for which and . This requires that and hence the fixed point of the mapping are the fixed points of the operator, i.e., those mass functions for which . In the following we analyse in detail the fixed points for the case in which there are states . Let represent a general mass function defined on this state space and where without loss of generality we take . For the Dubois & Prade operator the constraint that generates the following simulateous equations.
The Jacobian for this set of equations is given by;
The stable fixed points are those solutions to the above equations for which the eigenvalues of the Jacobian evaluated at the fixed point lie within the unit circle on the complex plane. In this case the only stable fixed points are the mass functions, and . In other words, the only stable fixed points are those for which agents’ beliefs are both certain and precise. That is where for some state , . The stable fixed points for Dempster’s rule and Yager’s operator are also of this form. The averaging operator is idempotent and all mass functions are unstable fixed points.
The above analysis concerns agent-based systems applying a combination in order to reach consensus. However, we have yet to incorporate evidential updating into this model. As outlined in Section 3 it is proposed that each agent investigates a particular state chosen according to her current beliefs using the pignistic distribution. With probability this will result in her updating her beliefs from to . Hence, for convergence it is also required that agents only choose to investigate states for which . Assuming then there is only one such fixed point corresponding to . Hence, the consensus driven by belief combination as characterised by the above fixed-point analysis will result in convergence of individual agent beliefs if we also incorporate evidential updating. That is, an agent with beliefs close to a fixed point of the operator, i.e., , will choose to investigate state with very high probability and will therefore tend to be close to a fixed point of the evidential updating process.
5 Simulation experiments
In this section we describe simulation experiments to study the dynamics of the four belief combination operators within the best-of- model introduced in Section 3. The aim is to understand the behaviour of these combination operators in this dynamic multi-agent context and to compare their performance under different evidence rates and noise levels, i.e., for varying values of and .
Unless otherwise stated, all experiments share the following parameter values. We consider a population of agents with beliefs initialised so that;
In other words, at the beginning of each simulation every agent is in a state of complete ignorance as represented in DST by allocating all mass to the set of all states. Each experiment is run for a maximum of iterations, or until the population converges. Here, convergence requires that the beliefs of the population have not changed for interactions, where an interaction may be the updating of beliefs based on evidence or the combination of beliefs between agents.
For a given set of parameter values the simulation is run times and results are then averaged across these runs. Quality values are defined so that for and consequently is the best state. Hence, in the following, provides a measure of convergence performance for the two operators.
Initially we consider the best-of- problem where with quality values , and . Figure 1 shows belief values for the best state averaged across agents and simulation runs for the evidence rate and noise standard deviation . For the Dubois & Prade operator there is complete convergence to while for Dempster’s rule the average value of at steady state is approximately . Both Yager’s operator and the averaging operator do not converge to a steady state and instead maintain an average value of oscillating around and , respectively. For both Dubois & Prade’s operator and Dempster’s rule, at steady state the average belief and plausibility values are equal. This is consistent with the fixed point analysis given in Section 4 in showing that all agents converge to mass functions of the form for some state . Indeed, for the Dubois & Prade operator all agents converge to while for Dempster’s rule this happens in the large majority of cases. In other words, the combination of updating from direct evidence and belief combination results in agents reaching the certain and precise belief that is the true state of the world.
In the following subsections we describe a more systematic comparison of the operators in this context, for varying values of and and also for different numbers of states .
5.1 Varying Evidence Rates
In this section we investigate how the rate at which agents receive information from their environment affects their ability to reach a consensus about the true state of the world.
Figures 2 and 2 show steady state values of averaged across agents and simulation runs for evidence rates in the lower range and across the whole range , respectively. For each operator we compare the combination of evidential updating and belief combination (solid lines) with that of evidential updating alone (dashed lines). From Figure 2 we see that for low values of Dempster’s rule converges to higher average values of than do the other operators. Indeed, for the average value of obtained using Dempster’s rule is approximately higher than is obtained using Dubois & Prade’s operator, and is significantly higher still than that of the averaging operator and Yager’s rule. However, the performance of Dempster’s rule declines significantly for higher evidence rates and for it converges to average values for of less than . At , when every agent is receiving evidence at each time step, there is failure to reach consensus when applying Dempster’s rule. Indeed, there is polarisation with the population splitting into separate groups, each certain that a different state is the best. In contrast, Dubois & Prade’s operator performs well for higher evidence rates and for all there is convergence to an average value of . Neither the averaging operator nor Yager’s rule appear to perform differently for increasing evidence rates and instead maintain similar levels of performance for . In Figure 2 and then subsequent figures showing steady state results, we do not include error bars as this impacts negatively on readability. Instead, Figure 3 shows the standard deviation for plotted against evidence rate. As expected, standard deviation is high for low evidence rates in which the sparsity of evidence results in different runs of the simulation converging to different states. This then declines rapidly with increasing evidence rates.
The dashed lines in Figures 2 and 2 show the values of obtained at steady state when there is only updating based on direct evidence. In most cases the performance is broadly no better than, and indeed often worse than, the results which combine evidential updating with belief combination between agents. For low evidence rates where the population does not tend to fully converge to a steady state since there is insufficient evidence available to allow convergence in time steps. For higher evidence rates under Dubois & Prade’s operator and Dempster’s rule, the population eventually converges on a single state with complete certainty. However, since the average value of in both cases is approximately for then clearly convergence is often not to the best state. Overall, it is clear then that in this formulation of the best-of- problem combining both updating from direct evidence and belief combination results in much better performance than obtained by using evidential updating alone for both Dubois & Prade’s operator and Dempster’s rule. Meanwhile, it is apparent that the averaging operator is not affected by the combined updating method, whereas Yager’s rule is sometimes adversely affected for lower evidence rates .
5.2 Noisy Evidence
Noise is ubiquitous in applications of multi-agent systems. In embodied agents such as robots this is often a result of sensor errors, but noise can also be a feature of an inherently variable environment. In this section we consider the effect of evidential noise on the best-of- problem, as governed by the standard distribution of the noise distribution.
Figure 4 shows the average value of at steady state plotted against for different evidence rates . Figure 4 shows that for an evidence rate , both Dubois & Prade’s operator and Dempster’s rule have very similar performance in the presence of noise. The averaging operator and Yager’s rule also exhibit similar performance to one another for this evidence rate. For example with no noise, i.e., , Dubois & Prade’s operator converges to an average value of , Dempster’s rule converges to on average, Yager’s rule to an average of and the averaging operator to . Then, with , Dubois & Prade’s operator converges to an average value of , Dempster’s rule to , Yager’s rule to and the averaging operator converges to . Hence, all operators are affected by the noise to a similar extent given this low evidence rate.
In contrast, for the evidence rates of and , Figures 4 and 4, respectively, we see that Dubois & Prade operator is the most robust operator to increased noise. Specifically, for and , Dubois & Prade’s operator converges to an average value of and for this only decreases to . On the other hand, the presence of noise at this evidence rate has a much higher impact on the performance of Dempster’s rule and the averaging operator. For Dempster’s rule converges to an average value of but this decreases to for , and for the averaging operator the average value of and decreases to . The contrast between the performance of the operators in the presence of noise is even greater for the evidence rate as seen in Figure 4. Yager’s rule is the exception in this context since for both evidence rates and , the average value of remains constant at approximately .
5.3 Scalability to Large Numbers of States
In many distributed decision-making applications the size of the state space, i.e., the value of in the best-of- problem, will be large. In the swarm robotics literature most best-of- studies are for (see for example [Valentini et al.2014, Reina et al.2018]). However, there is growing interest in studying larger numbers of choices in this context [Reina et al.2017, Crosscombe et al.2017]. Hence, it is important to investigate the scalability of the proposed DST approach to large values of .
Having up to now focused on the case, in this section we present additional simulation results for and . As proposed in the introduction to Section 5 the quality values are allocated so that for . Hence, the relevant values for the and cases are , and , respectively. For this section, we only consider Dubois & Prade’s operator and Dempster’s rule due to their better performance when compared with the other two combination operators.
For Dubois & Prade’s operator, Figure 5 shows steady state values of , i.e., belief in the best state for and , plotted against noise for evidence rate . Figure 5 then shows the same results for Dempster’s rule. For Dubois & Prade’s operator the steady state average is which is close to the value of for the case. However, for the value of is when , corresponding to a significant decrease in performance. Nonetheless, from Figure 5, we see that for Dempster’s rule performance declines much more rapidly with increasing than for Dubois & Prade’s operator. So at and the average value at steady state for Dempster’s rule is and for it is . As expected the performance of both operators decreases as increases, with Dubois & Prade’s operator also being more robust to noise than Dempster’s rule for large values of .
Hence, the results support only limited scalability for the DST approach to the best-of- problem, at least as far as uniquely identifying the best state is concerned. Furthermore, as increases so does sensitivity to noise. This reduced performance may in part be a feature of the way quality values have been allocated. Notice that as increases the difference between successive quality values decreases. This is likely to make it difficult for a population of agents to distinguish between the best state and those which have increasingly similar quality values. Furthermore, a given noise standard deviation is more likely to result in an inaccurate ordering of the quality values the closer those values are to each other.
6 Discussion and Conclusions
In this paper we have introduced a model of consensus formation in the best-of- problem which combines updating from direct evidence with belief combination between pairs of agents. We have utilised DST as a convenient framework for representing agents’ beliefs, as well as the evidence that agents receive from the environment. In particular, we have studied and compared the macro-level convergence properties of several established operators applied iteratively in a dynamic multi-agent setting and through simulation we have identified several important properties of these operators within this context. Dubois & Prade’s operator is shown to be most effective at reducing polarisation and reaching a consensus for all except very low evidence rates, despite it not satisfying certain desirable properties, e.g., associativity. It is also more robust to noise. We believe that underlying the difference in the performance of all but the averaging operator is the way in which they differ in their handling of inconsistent beliefs.
Further work will investigate the issue of scalability in more detail, including whether alternatives to the updating process may be applicable in a DST model, such as that of negative updating in swarm robotics [Lee et al.2018b]. We must also consider the increasing computational cost of DST as the size of the state space increases and investigate other representations such as possibility theory as a means of avoiding exponential increases in the cost of storing and combining mass functions. Finally, we hope to adapt our method to be applied to a network, as opposed to a complete graph, so as to study the effects of limited or constrained communications on convergence.
This work was funded and delivered in partnership between Thales Group, University of Bristol and with the support of the UK Engineering and Physical Sciences Research Council, ref. EP/R004757/1 entitled “Thales-Bristol Partnership in Hybrid Autonomous Systems Engineering (T-B PHASE).”
- [Cho and Swami2014] Jin-Hee Cho and Ananthram Swami. Dynamics of Uncertain Opinions in Social Networks. 2014 IEEE Military Communications Conference, pages 1627–1632, 2014.
- [Crosscombe and Lawry2016] Michael Crosscombe and Jonathan Lawry. A Model of Multi-Agent Consensus for Vague and Uncertain Beliefs. Adaptive Behavior, 24(4):249–260, 2016.
- [Crosscombe et al.2017] Michael Crosscombe, Jonathan Lawry, Sabine Hauert, and Martin Homer. Robust distributed decision-making in robot swarms: Exploiting a third truth state. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4326–4332. IEEE, sep 2017.
- [Dabarera et al.2014] Ranga Dabarera, Rafael Nú, ñez, Kamal Premaratne, and Manohar N. Murthi. Dynamics of Belief Theoretic Agent Opinions Under Bounded Confidence. 17’th International Conference on Information Fusion (FUSION), pages 1–8, 2014.
- [Dempster1967] Arthur P. Dempster. Upper and Lower Probabilities Induced by a Multivalued Mapping. The Annals of Mathematical Statistics, 38(2):325–339, 1967.
- [Douven and Kelp2011] Igor Douven and Christoph Kelp. Truth Approximation, Social Epistemology, and Opinion Dynamics. Erkenntnis, 75:271–283, 2011.
- [Dubois and Prade2008] Didier Dubois and Henri Prade. A set-theoretic view of belief functions. Classic Works of the Dempster-Shafer Theory of Belief Functions. Studies in Fuzziness and Soft Computing, 219:375–410, 2008.
- [Dubois et al.2016] Didier Dubois, Weiru Liu, Jianbing Ma, and Henri Prade. The basic principles of uncertain information fusion. an organised review of merging rules in different representation frameworks. Information Fusion, 32:12 – 39, 2016.
- [Hegselmann and Krause2002] Rainer Hegselmann and Ulrich Krause. Opinion dynamics and bounded confidence: models, analysis and simulation. J. Artificial Societies and Social Simulation, 5, 2002.
- [Jøsang2002] Audun Jøsang. The consensus operator for combining beliefs. Artificial Intelligence, 141(1-2):157–170, oct 2002.
- [Lee et al.2018a] Chanelle Lee, Jonathan Lawry, and Alan Winfield. Combining Opinion Pooling and Evidential Updating for Multi-Agent Consensus. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) Combining, pages 347–353, 2018.
- [Lee et al.2018b] Chanelle Lee, Jonathan Lawry, and Alan Winfield. Negative Updating Combined with Opinion Pooling in the Best-of-n Problem in Swarm Robotics. In Proceedings of the International Conference on Swarm Intelligence ANTS 18, pages 97–108, 2018.
- [Lu et al.2015] Xi Lu, Hongming Mo, and Yong Deng. An evidential opinion dynamics model based on heterogeneous social influential power. Chaos, Solitons and Fractals, 73:98–107, 2015.
- [Parker and Zhang2009] C. A. C. Parker and H. Zhang. Cooperative decision-making in decentralized multiple-robot systems: The best-of-n problem. IEEE/ASME Transactions on Mechatronics, 14(2):240–251, April 2009.
- [Reina et al.2017] Andreagiovanni Reina, James A. R. Marshall, Vito Trianni, and Thomas Bose. Model of the best-of- nest-site selection process in honeybees. Phys. Rev. E, 95:052411, May 2017.
- [Reina et al.2018] Andreagiovanni Reina, Thomas Bose, Vito Trianni, and James A. R. Marshall. Effects of Spatiality on Value-Sensitive Decisions Made by Robot Swarms, pages 461–473. Springer International Publishing, Cham, 2018.
- [Shafer1976] Glenn Shafer. A Mathematical Theory of Evidence. Princeton University Press, Princeton, 1976.
- [Smets and Kennes1994] Philippe Smets and Robert Kennes. The Transferable Belief Model. Artificial Intelligence, 66:387–412, 1994.
- [Smets2007] Philippe Smets. Analyzing the combination of conflicting belief functions. Information Fusion, 8(4):387–412, 2007.
- [Valentini et al.2014] Gabriele Valentini, Heiko Hamann, and Marco Dorigo. Self-organized collective decision making: The weighted voter model. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS ’14, pages 45–52, Richland, SC, 2014. International Foundation for Autonomous Agents and Multiagent Systems.
- [Valentini et al.2017] Gabriele Valentini, Eliseo Ferrante, and Marco Dorigo. The Best-of-n Problem in Robot Swarms: Formalization, State of the Art, and Novel Perspectives. Frontiers in Robotics and AI, 4(March), 2017.
- [Wickramarathne et al.2014] Thanuka L Wickramarathne, Kamal Premaratine, Manohar N Murthi, and Nitesh V. Chawla. Convergence Analysis of Iterated Belief Revision in Complex Fusion Environments. IEEE Journal of Selected Topics in Signal Processing, 8(4):598–612, 2014.
- [Yager1992] Ronald R. Yager. On the specificity of a possibility distribution. Fuzzy Sets and Systems, 50(3):279 – 292, 1992.