GUT: A General Cooperative Multi-Agent Hierarchical Decision Architecture in Adversarial Environments

04/23/2020 ∙ by Qin Yang, et al. ∙ 0

Adversarial Robotics is a burgeoning research area in Swarms and Multi-Agent Systems. It mainly focuses on agents working on dangerous, hazardous, and risky environments, which will prevent robots to achieve their tasks smoothly. In Adversarial Environments, the adversaries can be intentional and unintentional based on their needs and motivation. Agents need to adopt suitable strategies according to the current situation maximizing their utility or needs. In this paper, we design a game-like Exploration task, where both intentional (Monsters) and unintentional (Obstacles) adversaries challenge the Explorer robots in achieving their target. In order to mimic the rational decision process of an intelligent agent, we propose a new Game-Theoretic Utility Tree (GUT) architecture combining the core principles of game theory, utility theory, probabilistic graphical models, and tree structure decomposing the high-level strategy to executable lower levels. We show through simulation experiments that through the use of GUT, the Explorer agents can effectively cooperate between themselves and increase the utility of the individual agents and of the global system, and achieve higher success in task completion.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Natural systems have been the key inspirations in the design, study, and analysis of multi-robot and multi-agent systems (MAS) [30, 5, 6, 9]. For example, when a simple individual agent interacts with another agent or with the environment, it usually try to find a suitable way adapting to the current situation and satisfying its basic needs in the short-term. But in a more complex system with multiple agents, individual agents usually builds a kind of cooperation alliance based on commonly agreed needs between each other to maximize their benefits, mitigate challenges in the environment, and achieve their long-term goals and global objectives. Cooperation in MAS can maximize system utility and guarantee sustainable development for each group member [32]. On the other hand, it is also important to perceive the environment and recognize the threats and adversaries in the environment cooperatively among all agents in the MAS team. An Adversary in the environment impairs the ability of the individual agents and the global MAS to achieve their tasks and also challenges their needs in certain scenarios [8, 15, 4].

Fig. 1: An illustrative game scenario where the paths (to Task) of the Explorer robots are blocked by Monster robots.

Following the examples from Information Systems [14]

, we can classify an adversarial agent based on their needs and motivations into two general categories:

intentional (which actively impair the MAS needs and capabilities such as an enemy or intelligent opponent agent) and unintentional (which might potentially or passively threaten MAS abilities, like obstacles and weather). They can also be referred to as deliberate (Monsters) and accidental (Obstacles) adversaries, respectively.

Recently, researchers have been able to combine the disciplines of Robotics and adversarial MAS into Adversarial Robotics, which focus on autonomous agents and mobile robots operating in adversarial environments [2, 36, 28, 8]. Generally, we can describe an adversarial environment as the scenarios that combine intentional and unintentional adversaries, which prevent robots from obtaining their needs, achieving their tasks.

Moreover, in a dynamically changing environment, agents frequently decide to switch their behaviors and actions according to the situation and needs. For example, Agent 1 might be recognized as an adversary to Agent 2 in one scenario and when the situation lead to their needs’ changing in the future, they might develop a neutral relationship or become an ally and cooperatively perform a task.

Most of past and current research focus on the unintentional adversaries in the environment, such as path planning avoiding static or dynamical obstacles, formation control avoiding collision and so forth [31]. This is particularly applicable to urban search and rescue missions and robots deployed in disaster environments, where the robots are more concerned about unintentional threats such as radiation, fire, water, etc.

In this paper, we propose a general decision architecture to mimic rational thinking process of intelligent agents in an adversarial environment. We design a simple exploration game (see Fig. 1) which contains both intentional and unintentional adversaries. The main contributions of the paper are outlined below.

  • First, we define the adversarial environment from a robot needs perspective and treat the two adversaries - unintentional and intentional separately.

  • Second, we propose a new Game-Theoretic Utility Tree (GUT), which combine the principles and merits of Game Theory [24], Utility Theory [11], Probabilistic Graphical Model (PGM) [19], and Tree Structure exploiting its hierarchy. It can calculate the suitable tactics (behaviors) based on current utility in multiple levels and decompose the high level strategies into low level executable plans.

  • Third, to tackle the static unintentional adversaries in MAS, we present an efficient distributed algorithm called ”Adapting The Edge”, which combines individual adapting behaviours and group cooperation together.

  • Finally, we validate the proposed approach through extensive simulation experiments demonstrating its utility in a simple Exploration Game considering various scenarios of Explorer to Monster ratios with distinct cooperative models under different environmental settings.

Ii Related Work

In majority of the Adversarial Robotics literature, the adversaries are not artificial intelligent agent

[2, 36]. They might be natural force like wind, fire, rain or other creatures’ aggressive behaviors. From the task’s perspective, some researchers categorize multi-robots adversarial robotics into four main class:

  • Adversarial Patrol [3]

  • Adversarial Coverage [2, 36]

  • Adversarial Formation [31]

  • Adversarial Navigation [16, 1, 17]

But the most challenges usually come from the physical world, especially motion, dynamic and continuous spaces. When we model the uncertainty in adversarial environment, we need to consider how to build suitable and specific modeling for robots with respect to self and opponent perception, utility calculation, decision making, and motion planning.

In the recent research, Lin [21] examined the problem of defending against a sequential attack in a knowledgeable adversarial environment. Prorok [37] did some studies focusing on multi-robot privacy with adversarial approach. From the swarm robotics perspective, Sanghvi and Sycara [28]

identified a swarm vulnerability and studied how an adversary is take advantage of it. From a machine learning perspective, Paulos and Kumar

[26] describe a architecture for training teams of identical agents in a defense game.

Some studies also focus on solving multi-player pursuit and evasion game problem [34, 8, 20], which mainly deals with how to guide one or a group of pursuers to catch one or a group of moving evaders [7]. This problem covers formation keeping, conflict resolution, and optimum task allocation [12]. The more recent works mainly concentrate on optimal evasion strategies and task allocation [29, 22] and predictive learning from agents’ behaviors [33]. In our Explore Game problem, individual agent’s motivation is not to pursue or catch specific agents but based on their shared needs and cooperation with each other agents in the system to explore an adversarial area while satisfying their task requirements and mission objectives.

However, there is little research done in studying confrontational strategies, preventive control and behaviors to mitigate intentional adversaries, which can be considered as active, intelligent opponent agents.

Intentional adversaries also play an important role in many applications such as military,/defense, where the multi-robot system cooperating with each other to achieve some global missions do not only have to consider the unintentional adversaries like wind and obstacles impeding their path towards their targets, but also the intelligent adversarial agents such as enemy robots.

In a MAS, the agents have to exhibit an awareness of the environment not only at an individual agent level but also at a system level, wher computational game theory provide useful examples of the study in the area of machine behaviour [27]. To address the gaps in the literature, we build a general GUT architecture combining Game Theory [25, 24], Utility Theory [11, 18], Probabilistic Graphical Model(PGM) [19, 13] and Tree Structure to calculate and decompose the decision strategies specifically to tackle intentional adversaries. We also design an algorithm termed ”Adapting The Edge” to help the MRS avoid static unintentional adversaries efficiently.

Iii Problem Statement

Multi-robots or MAS working in adversarial environment is a complex distributed system, especially multiple groups of intelligent agents having different purposes interacting with each other, which will present various relationships and behaviours. But the most important challenge in this scenario is how to organize these robots to work together and adopt suitable strategies guaranteeing their maximum utility corresponding to adversaries’ tactics.

We design a Exploration Game mimicking a group of Explorer agents go through an adversarial environment to explore the treasure (Target) as shown the Fig. 1. In this scenario, there are several Monsters (intelligent autonomous agents) representing intentional adversaries randomly distributed in the path of treasure. Once the Monsters detect any explorers, they will prevent them from passing through. Two mountain-like Obstacles are considered to be unintentional adversaries impeding the explorers’ movement task.

In this whole process, we assume each Monster do not communicate between each other (acting independently) and they all act based on their greedy self-interest (individual rationality), which means each Monster always care about own benefits. However, the Explorers can communicate between other explorer agents and can share information with each other representing collective rationality. The problem to be solved is to device the cooperative strategies for the Explorer agents such that they all collectively reach the Target while tackling both intentional and unintentional adversaries on the way. Through this problem, we aim to evaluate the individual and system utility of Explorers and Monsters under different strategies, scenarios, and environments.

Iv Approach Overview

To solve the above problem, the autonomous Explorer agents need to adopt various strategies and plans based on their current status (needs and utility) and optimize/guarantee the utility of the individual robots and the collective MAS system.

For the intentional adversaries (Monsters), we design the GUT architecture Fig. 2, which can calculate each level’s tactics based on current utility and decompose the high level’s decision into lower levels. We also add two kinds of assumption in the decision level: one is irrational decision which means if the value of individual utility lower to certain level, it will present a kind of instinctive behaviors, such as escaping, guaranteeing its safety. On the other hand, if current condition fits the low level needs like safety or basic needs, it will enter into rational decision process.

For simplicity, we build a three level GUT to elaborate it, which guarantee that the individual robot’s rational decision can be decomposed to executable level. The first level (high-level) determine whether or not to attack (or defend). Then, the second level is to figure out the specific agent to be attacked (or defended from). According to the previous decision, the last level (lowest level) decides how the agents should group themselves adapting the current situation.

Fig. 2: General Individual Robot’s GUT

More specifically, in the first level we define Explorers and Monsters both have two strategies: and , which are represented through and formation shapes, respectively. According to the payoff matrix Table. I in a zero-sum game, they can calculate the strategy which can fit for the current situation. Then, based on the precondition in the first level, they need to decide attacking or defending the specific agent. For example, we assume that in the attacking model Explorers and Monsters have two kinds of behaviors: attacking the nearest agent or the attacking ability lowest agent in the formation. But in the defending model, they can choose defending the nearest agent or the attacking ability lowest agent. Through the corresponding payoff matrix Table. II, they can confirm the target sequence. Finally, in the lowest level, according to the corresponding tactics payoff matrix Table. III, individual can calculate the final tactics, such as the number of groups in Explorers and the behavior of following others or not in Monsters.

Through this process, we decompose the individual agent’s strategy into three levels and each level focus on different utilities corresponding to different needs or requirements. In order to simplify this calculation process, across every level, we respectively use Winning probability (

), the relative expected cost of energy () and HP (health power) () – the expected utility difference of both sides – to measure every level’s utility. In the first level, the utility presents the Winning probability which relate to the current perceived adversaries’ number and individual attacking and defending ability. In the second level, we consider the relative expected energy cost to describe this level’s utility, which depends on the agents’ distribution and numbers. In the lowest level, we use the relative expected HP cost representing the utility caused by the individual and group’s current information, such as the number of groups and agent’s current energy level. This decision decomposition process also disassemble the individual needs into different level, which mimic the intelligent agent’s thinking process [35].

MBUtilityEB Attack Defend
Attack
Defend
TABLE I: Level 1 - Explorer & Monster Tactics Payoff Matrix
MBUtilityEB Nearest A Lowest A Highest
Nearest
A Lowest
A Highest )
TABLE II: Level 2 - Explorer & Monster Tactics Payoff Matrix
MBUtilityEB One Group Two Group Three Group
Independent
Dependent
TABLE III: Level 3 - Explorer & Monster Tactics Payoff Matrix

For the unintentional adversaries, we design the Adapting The Edge algorithm, which can help individual agent tackle (static) unintentional adversaries and adapting their edge’s trajectory until it finds a suitable route to the goal point. In this process, through the communication and information sharing with other Explorer agents, individual agent can select the direction having less possibility of potential collision with the unintentional adversaries to move. In our scenarios, the two mountains represent the unintentional adversaries and Explorers need to find a path passing through them.

Initially, the group of Explorers form the patrol formation to detect the unknown world and select the shortest path to the treasure. When they perceive an unintentional adversaries (mountains), individual agents need to adapt to their current situation and combine perceived and shared information to find a route passing through them. If they detect intentional adversaries (Monsters), each Explorer will compute its current strategy based on GUT, then through negotiation and agreement cooperating with each other. These present a kind of global behaviors performing Collective Rationality and caring about Group interest. In contrast, each Monster also follow the same process but do not cooperate with each other and is Self-interest. Additionally, in this process, if individual agent’s HP value lower to a threshold level, it will adopt instinct behavior (irrational behaviour) by escaping the current status to fit its safety needs.

V Formalization and Algorithms

Below, we formalize the adversarial environment and the decision process in intentional and unintentional adversaries.

V-a Adversarial Environment

Supposing we have three agents , in certain scenario. needs to fulfil the task satisfying its current need . In order to quantify the agent’s need, we use utility Function. 1 to calculate it.

(1)

presents the various factors involving the calculation such as time, energy and so forth. According to ’s capabilities, it will have the solution space , to perform. So we assume that complete using the without any interruption(not considering and ) and the corresponding utility value is . Then, considering and , if can not find any solution in making the Equation. 2 be established. It will regard and to Adversary.

(2)

In additional, if next corresponding to can increase its current , which means its expected utility lager than Formula. 3, it can be regard to Intentional Adversary. On the other hand, if current solution does not impact or is always zero Formula. 4. 5, we consider is Unintentional Adversary.

(3)
(4)
(5)

V-B Intentional Adversaries Decision

For the intentional decision, we have game in the each level of GUT Fig. 2, which can be described as Formula. 6:

(6)

Using present agent ’s utility, and the tactics of agent and also can be present as Formula. 7 and 8:

(7)
(8)

In every finite games has a Pure Strategy Nash Equilibrium or a Mixed Strategy Nash Equilibrium, so the process can be formalized as two step:
a. Compute Pure Strategy Nash Equilibrium

We can present agents’ utility matrix as Formula. 9:

(9)

The row and column correspond to the utility of agent and separately. Through calculating the minimum value list of each row and maximum value list of each column, we can compute the maximum and minimum values of the two lists separately.

(10)

If the two value satisfy the Formula. 10, we can get the first level Pure Strategy Nash Equilibrium .

(11)
(12)

b. Compute Mixed Strategy Nash Equilibrium

The tactics’ probability of agent present as Formula. 13.

(13)

Similarly, we also can conclude agent ’s tactics’ probability as Formula. 14.

(14)

As above discussion, can be defined as in certain status. Then, we can deduce the expected utility of agent and Formula. 15 and 16 separately.

(15)
(16)

In the , if we can get all the of agent and as Formula. 17 and 18, we can deduct the ’s as Formula. 19. Then, if we can compute a tactic satisfying Formula. 20 and 21, we define this tactic is the optimal strategy in current state and the game result is Formula. 22.

(17)
(18)
(19)
(20)
(21)
(22)

After computing the two steps, if the calculation is the Pure Strategy Nash Equilibrium, individual agent can obtain a unique tactic entering into the next level, which means this tactic’s possibility is one hundred percent. Otherwise in the Mixed Strategy Nash Equilibrium, it will present several tactics with some certain probability. And the set of each level’s feasible solution can present a specific possibility being the next level’s conditional probability. For example, if one of feasible solution’s possibility in level is , one branch set of feasible solution’s possibility in the next level is , . We can represent this entire process as Fig. 2. Until get the lowest level, individual agent can choose the most possible and suitable tactic set to adapt current situation.

To summary, according to the set of individual tactics in each level, we can build a finite solution space . Then, through computing each level’s Nash Equilibrium, we can get the corresponding GUT in current state.

The most important and challenging part here is the design, which connect the decision level and the action(planning) level. It also determines whether or not the agent or system can calculate a reasonable tactics to optimize its performance. So in this step, we first assume each robot has two basic attributes: Energy Level and HP. According to the discussion in IV, in the GUT first level obey , so the utility present its expectation and we can formalize it as Formula. 23.

(23)

The second Level’s utility is described the relative expected energy cost as Formula. 24, 25, 26 and 25. And we consider three parts of energy cost in the whole process: , and .

(24)
(25)
(26)
(27)

In the entire process, the agent’s action distance, the times of attacks and being attacked and the communication times obey , separately. So we can describe their distribution function as Formula. 28, 29, 30 and 31.

(28)
(29)
(30)
(31)

In the lowest level, we use the expected HP cost to explain the utility as Formula. 32, 33, 34 and 35.

(32)
(33)
(34)
(35)

and are similar to the Formula. 29 and 30 correspondingly. Here, and present the number of Explorers and Monsters separately; presents the group average distance between two opponents; presents the agent’s velocity; and present the times of attacks and being attacked; presents Explorers’ communication times; and present the unit attacking energy cost of both sides agents separately; and present average attacking ability levels of both sides separately; and present average defending ability levels of both sides separately; and present specific agent’s attacking ability levels of both sides separately; and present specific agent’s defending ability levels of both sides separately; and present individual agent’s size; presents the number of Explorers’ attacking simultaneously; presents the number of Monsters’ attacking simultaneously; , , , , and present corresponding coefficient; and present the current energy level of Explorer and Monster; present the current level of Explorer and Monster; presents the probability corresponding to the different section.

According to above discussion, we assume that Explorer and Monster have the same speed to move and Monster can not communicate with each, then we can simplify the Formula. 24 and 32 as Formula. 36 and 37.

(36)
(37)

Through the Formula. 23, 36 and 37, individual can calculate the utility and get the Nash Equilibrium in each level’s payoff matrix. After computing the entire GUT, it needs to combine each level’s tactics and execute the integrated strategy in the planning level. We can describe the decision process as Alg. 1.

Input : Explorers’ and Monsters’ states
Output : formation shape ; current attacking target ; number of groups .
1 set state = ;
2 while the changing of Monster’s number == True And Monster’s number != 0 do
3        if state== then
4               Compute the Nash Equilibrium;
5               Get the most feasible formation shape ;
6               state =
7        else if state== And s != Null then
8               Compute the Nash Equilibrium;
9               Get the most feasible attacking target ;
10               state =
11        else if state== And s, t != Null then
12               Compute the Nash Equilibrium;
13               Get the most feasible number of groups ;
14              
15        end if
16       
17 end while
18if Monster’s number == 0 then
19        = ;
20        = 1;
21 end if
return
Algorithm 1 Explore Game GUT Model

V-C Unintentional Adversaries Decision

Fig. 3: Illustration of multi-robot ”Adapt The Edge” formation control algrithm for obstacle avoidance.

When Explorers perceive the mountains which can be regarded as the static unintentional adversary, they need to utilize the limited information through communication and perception passing through them. In our experiment, the scenario can be describe as Fig. 3. There are nine robots and and detect the mountain. For robot , in order to avoid collision, it needs to switch current direction to tangent’s direction of the nearest collision point . Since it has two direction and , according to ’s state, it should select the direction which has more non-collision robots currently.

Specifically, considering the line , which is vertical to tangent through point , is the boundary. For the direction , there are five robots but only four robots , , and do not have collision. We also can see the non-collision robots’ number is three in direction . So currently, will select direction to move certain distance , then adjust the direction to the goal point to go forward and loop the entire process until perceive no unintentional adversaries on its routing. Combining the two kinds of decision, we present the entire decision process as Alg. 2. For simplicity, we only select the maximum feasible solution in each level entering the next level.

Input : Explorers’ and mountains’ states
Output : moving direction and distance
1 while The nearest collision point  do
2        calculate the number and of non-collision agents in both side of the line passing through and perpendicular ’s tangent;
3        if  then
4               = side in line ;
5               = one step of agent’s movement;
6        else if  then
7               agent stop;
8        else if  then
9               = side in line ;
10               = one step of agent’s movement;
11        end if
12       
13 end while
return = current position to goal point,
Algorithm 2 Adapting The Edge

Vi Evaluation through Simulations

Considering cross-platform, scalability, efficiency and extendability of the simulation, we chose “Unity” [10] to simulate the Explore Game and select Gamebit [23]

for calculating each level’s Nash Equilibrium. Because it is an open-source collection of tools for doing computation in game theory and can build, analyze, and explore game models cross-platform.

Fig. 4: Experiments on Monsters with Different Distribution Considering Unintentional Adversaries

Our experimental evaluation is focused on the distribution of agents’ tactics in possible solution space and implementing different parameter’s predictive model to analyse the system utility and cost. In our model, when individual calculate the utility, it relate to three main factors: the number of agents , individual unit attacking energy cost , agent’s current energy level . Since the parameters and might not clear for both sides, we design two kinds of experiments to evaluate the system’s performance. One is Complete Information, which means Explorer and Monster know each other’s status. On the other hand is Incomplete Information and agents need to use predictive models predicting opponent states and parameters.

In each experiment, we first consider no unintentional adversaries situation, and design 17 different scalability scenarios. Then we involve the unintentional adversaries(two rocks) and fix the both sides’ number distributing the Monsters in different position with three proportion comparing the system’s performance. We also implement Collective Rationality and Individual Rationality in Explorer’s side for all the scenarios.

We suppose each Explorer has same battery and HP levels initially and every moving step will cost energy. Also, every communication round and per time attacking will cost and energy, respectively. If Explorer is being attacked by Monster, it will cost HP per time. For the Monster, per time attacking energy and per time attacked HP cost are and , respectively.

In the complete information strategy, we assume that if individual agent can perceive the adversary, it will know the opponent status, such as unit attacking energy cost and energy level, vice versa. For incomplete information part, we use two different kinds of predictive models (linear and nonlinear) based on individual HP cost to predict opponent unit attacking energy and agent’s current energy .

Linear Predictive Model

In the linear predictive model, we use agent’s unit HP cost and average system HP cost to predict opponent unit attacking energy cost and energy level correspondingly. The model can be presented as Formula. and

Nonlinear Predictive Model

For the nonlinear predictive model, we just replace the above formulas’ linear part with the natural logarithm formula. And the models are shown as and . Here, , , , and are the coefficients,

presents the noise following the normal distribution

.

Vi-a Environments with only intentional adversary

In this setting, we consider three kinds of environments with fixed number of Explorers (E=25) and Monsters (M=25) but with different Monster distributions (D2, D3, D4) as shown in Fig. 4.

Through this experiment, we compare the Explorer HP cost and the number of Explorer losing of killing per Monster, which can evaluate the game’s difficulty level. And we also analysis the average HP and energy cost of per Explorer completing the task, which can present the system’s performance and utility. We found that the performance of Explorer cooperation is always better than the noncooperation’s, which means collective rationality can bring more benefit and interest comparing self-interest when multi-agents work with each other. Also the number of Explorer losing of killing per Monster and the Explorer average HP cost present linear positive correlation with the ratio of Monster and Explorer.

Comparing the strategy of Explorer and Monster in Table IV, we can notice that if the Monster’s main strategy is attacking, the Explorer will lose the game. On the hand, if the Explorers’ main strategy is attacking or the ratio of the times of attacking and defending higher than certain level, they will win the game.

Fig. 5: Experiments on Scalability and Complexity.
Fig. 6: Normalized frequency of choosing Attack-Nearest-Three Group in Explorers. Normalized frequency of choosing different behaviors of Monsters in diffrent game strategies under different environments (M=E=25).
Fig. 7: Normalized frequency of choosing Attack-Nearest-Three Group in Explorers.
Ra Com Incom L Incom NonL Noncoop
R R R R
5:45 - 0.06 10.00 - 0.03 13.71 - 0.64 11.29 - 0 5.50
10:40 - 0.10 2.54 - 0.15 1.39 - 0.16 1.31 - 0 0.63
15:35 - 0.22 1.58 - 0.25 0.60 - 0.46 0.68 - 0 0.34
20:30 8.40 0.33 1.39 0.46 - 0.88 0.56 - 0 0.26
25:25 0.29 30.08 0.31 0.21 - 0 0.22
30:20 0.24 0.31 0.22 - 0 0.15
35:15 0.18 0.24 0.23 - 0 0.04
40:10 0.18 0.18 0.09 0 0.26
45:5 0.23 0.20 0.17 0 0.17
25:15 0.22 0.17 0.20 - 0 0.23
25:20 0.17 0.20 0.34 - 0 0.11
25:30 4.88 0.28 0.31 0.20 - 0 0.33
25:35 - 0.59 0.26 - 0.95 0.37 - 2.10 0.34 - 0 0.34
15:25 - 2.65 0.46 - 0.61 0.53 - 0.89 0.51 - 0 0.75
20:25 - 0.48 0.28 - 9.25 0.37 - 2.69 0.37 - 0 0.12
30:25 0.24 1.58 0.31 89.00 0.25 - 0 0.21
35:25 0.19 0.38 0.20 - 0 0.13
TABLE IV: Strategy Comparison no Unintentional Adversary. Ra: Ratio of Explorers to Monsters, R: result, : win, Com: complete information, L: linear, A: attacking, D: defending, : frequency of agents’ attack/defend behaviors.

Vi-B Environments with unintentional adversary

In this experiment, we consider 25 Monsters with four different distribution Fig. 4. Through the system performance’s analysis Table. V, we can clearly notice that the first scenario has better performance than others comparing with Explorer average energy cost and Explorer average HP cost . Fig. 6 and 7 provide the strategies distribution on in the solution space based on different scenarios, which follow the normal distribution with different and . And if the is large, this means the current situation has lots of uncertainty and the group tactics (Explorer) or the individual tactics (Monster) will represent more possibility combination in the solution space.

Our experiment verify that the collective rationality can bring more benefit and interest comparing self-interest when multi-agents work with each other. Also through reducing the solution space by the SGDT calculation, we analyse the correlation between system performance and the tactics of group and individual. A video demonstration of the experiments is available in the anonymized link at https://streamable.com/bmblm.

Envir. Com Incom L Incom NonL Noncoop
D1 48.72 82.18 0.91 52.08 91.72 1.05 50.53 80.09 0.90 6.60 - -
D2 59.89 90.95 1.00 67.67 80.86 0.64 63.10 90.42 0.92 7.91 - -
D3 63.51 80.62 0.65 62.18 89.87 0.95 62.85 89.18 1.10 5.66 - -
D4 62.80 93.20 1.09 56.20 90.16 1.00 58.64 82.72 0.74 6.54 - -
TABLE V: Performance Comparison with Unintentional Adversary. : Explorer average energy cost, : Explorer average HP cost, : the proportion of losing number of E and M.

Vii Conclusions

Our work introduces a general decision framework GUT - Game-Theoretic Utility Tree to mimic intelligent agent thinking process in adversarial environment, which combine the game theory, utility theory, probabilistic graphical model and tree structure. We define and formal the Adversarial Environment from robot’s needs perspective and classify adversary into unintentional and intentional. In order to tackle the static unintentional adversaries in multi-agents system, we present the Adapting The Edge distributed algorithm. Finally, we validate our approach through extensive simulation experiments.

The proposed architecture provide the decision level based on our previous work SRSS, which combine the low level planning framework causing intelligent agent adapting and cooperating dynamical environment through its decision. This approach also leave lots of future work for further research, such as individual learning from various scenarios helping the entire system upgrade, decreasing duplicate calculation saving computing resource in decision level, the appropriate utility function designing, optimizing and building suitable predictive model and parameters estimation. In addition, we also plan to implement our framework in the real robots, which can help us to develop better verification procedures for computational models of these systems.

References

  • [1] N. Agmon, Y. Elmaliah, Y. Mor, and O. Slor (2011) Robot navigation with weak sensors. In International Conference on Autonomous Agents and Multiagent Systems, pp. 272–276. Cited by: 4th item.
  • [2] N. Agmon, G. A. Kaminka, and S. Kraus (2011) Multi-robot adversarial patrolling: facing a full-knowledge opponent. Journal of Artificial Intelligence Research 42, pp. 887–916. Cited by: §I, 2nd item, §II.
  • [3] N. Agmon, S. Kraus, G. A. Kaminka, and V. Sadov (2009) Adversarial uncertainty in multi-robot patrol. In Twenty-First International Joint Conference on Artificial Intelligence, Cited by: 1st item.
  • [4] N. Agmon, S. Kraus, and G. A. Kaminka (2009) Uncertainties in adversarial patrol. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pp. 1267–1268. Cited by: §I.
  • [5] C. Blum and D. Merkle (2008) Swarm intelligence. Swarm Intelligence in Optimization; Blum, C., Merkle, D., Eds, pp. 43–85. Cited by: §I.
  • [6] E. Bonabeau, M. Dorigo, and G. Theraulaz (1999) From natural to artificial swarm intelligence. Cited by: §I.
  • [7] P. Cheng (2003) A short survey on pursuit-evasion games. Department of Computer Science, University of Illinois at Urbana-Champaign. Cited by: §II.
  • [8] T. H. Chung, G. A. Hollinger, and V. Isler (2011) Search and pursuit-evasion in mobile robotics. Autonomous robots 31 (4), pp. 299. Cited by: §I, §I, §II.
  • [9] M. Dorigo, M. Birattari, and M. Brambilla (2014) Swarm robotics. Scholarpedia 9 (1), pp. 1463. Cited by: §I.
  • [10] U. G. Engine (2008) Unity game engine-official site. Online][Cited: October 9, 2008.] http://unity3d. com, pp. 1534–4320. Cited by: §VI.
  • [11] P. C. Fishburn (1970) Utility theory for decision making. Technical report Research analysis corp McLean VA. Cited by: 2nd item, §II.
  • [12] J. S. Jang and C. Tomlin (2005) Control strategies in multi-player pursuit and evasion game. In AIAA guidance, navigation, and control conference and exhibit, pp. 6239. Cited by: §II.
  • [13] M. I. Jordan (2003) An introduction to probabilistic graphical models. preparation. Cited by: §II.
  • [14] M. Jouini, L. B. A. Rabai, and A. B. Aissa (2014) Classification of security threats in information systems. Procedia Computer Science 32, pp. 489–496. Cited by: §I.
  • [15] M. Jun and R. D’Andrea (2003) Path planning for unmanned aerial vehicles in uncertain and adversarial environments. In Cooperative control: models, applications and algorithms, pp. 95–110. Cited by: §I.
  • [16] O. Keidar and N. Agmon (2017) Safety first: strategic navigation in adversarial environments. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 1581–1583. Cited by: 4th item.
  • [17] O. Keidar and N. Agmon (2018) Safe navigation in adversarial environments. Annals of Mathematics and Artificial Intelligence 83 (2), pp. 121–164. Cited by: 4th item.
  • [18] M. J. Kochenderfer (2015) Decision making under uncertainty: theory and application. MIT press. Cited by: §II.
  • [19] D. Koller and N. Friedman (2009) Probabilistic graphical models: principles and techniques. MIT press. Cited by: 2nd item, §II.
  • [20] A. Kolling and S. Carpin (2010) Multi-robot pursuit-evasion without maps. In 2010 IEEE International Conference on Robotics and Automation, pp. 3045–3051. Cited by: §II.
  • [21] E. S. Lin, N. Agmon, and S. Kraus (2019) Multi-robot adversarial patrolling: handling sequential attacks. Artificial Intelligence 274, pp. 1–25. Cited by: §II.
  • [22] V. R. Makkapati and P. Tsiotras (2019) Optimal evading strategies and task allocation in multi-player pursuit–evasion problems. Dynamic Games and Applications, pp. 1–20. Cited by: §II.
  • [23] R. D. McKelvey, A. M. McLennan, and T. L. Turocy (2006) Gambit: software tools for game theory. Cited by: §VI.
  • [24] R. B. Myerson (2013) Game theory. Harvard university press. Cited by: 2nd item, §II.
  • [25] J. F. Nash et al. (1950) Equilibrium points in n-person games. Proceedings of the national academy of sciences 36 (1), pp. 48–49. Cited by: §II.
  • [26] J. Paulos, S. W. Chen, D. Shishika, and V. Kumar (2019) Decentralization of multiagent policies by learning what to communicate. arXiv preprint arXiv:1901.08490. Cited by: §II.
  • [27] I. Rahwan, M. Cebrian, N. Obradovich, J. Bongard, J. Bonnefon, C. Breazeal, J. W. Crandall, N. A. Christakis, I. D. Couzin, M. O. Jackson, et al. (2019) Machine behaviour. Nature 568 (7753), pp. 477. Cited by: §II.
  • [28] N. Sanghvi, S. Nagavalli, and K. Sycara (2017) Exploiting robotic swarm characteristics for adversarial subversion in coverage tasks. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 511–519. Cited by: §I, §II.
  • [29] W. L. Scott III (2017) Optimal evasive strategies for groups of interacting agents with motion constraints. Ph.D. Thesis, PhD thesis, Princeton University. Cited by: §II.
  • [30] J. S. Shamma (2007) Cooperative control of distributed multi-agent systems. Wiley Online Library. Cited by: §I.
  • [31] Y. Shapira and N. Agmon (2015) Path planning for optimizing survivability of multi-robot formation in adversarial environments. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4544–4549. Cited by: §I, 3rd item.
  • [32] J. Shen, X. Zhang, and V. Lesser (2004) Degree of local cooperation and its implication on global utility. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 2, pp. 546–553. Cited by: §I.
  • [33] S. Shivam, A. Kanellopoulos, K. G. Vamvoudakis, and Y. Wardi (2019)

    A predictive deep learning approach to output regulation: the case of collaborative pursuit evasion

    .
    arXiv preprint arXiv:1909.00893. Cited by: §II.
  • [34] R. Vidal, S. Rashid, C. Sharp, O. Shakernia, J. Kim, and S. Sastry (2001) Pursuit-evasion games with unmanned ground and aerial vehicles. In Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No. 01CH37164), Vol. 3, pp. 2948–2955. Cited by: §II.
  • [35] Q. Yang, Z. Luo, W. Song, and R. Parasuraman (2019) Self-Reactive Planning of Multi-Robots with Dynamic Task Assignments. In IEEE International Symposium on Multi-Robot and Multi-Agent Systems (MRS) 2019, Note: Extended Abstract Cited by: §IV.
  • [36] R. Yehoshua and N. Agmon (2015) Adversarial modeling in the robotic coverage problem. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pp. 891–899. Cited by: §I, 2nd item, §II.
  • [37] H. Zheng, J. Panerati, G. Beltrame, and A. Prorok (2019) An adversarial approach to private flocking in mobile robot teams. arXiv preprint arXiv:1909.10387. Cited by: §II.