Dynamic Psychological Game Theory for Secure Internet of Battlefield Things (IoBT) Systems

08/29/2018 ∙ by Ye Hu, et al. ∙ Virginia Polytechnic Institute and State University 0

In this paper, a novel anti-jamming mechanism is proposed to analyze and enhance the security of adversarial Internet of Battlefield Things (IoBT) systems. In particular, the problem is formulated as a dynamic psychological game between a soldier and an attacker. In this game, the soldier seeks to accomplish a time-critical mission by traversing a battlefield within a certain amount of time, while maintaining its connectivity with an IoBT network. The attacker, on the other hand, seeks to find the optimal opportunity to compromise the IoBT network and maximize the delay of the soldier's IoBT transmission link. The soldier and the attacker's psychological behavior are captured using tools from psychological game theory, with which the soldier's and attacker's intentions to harm one another are considered in their utilities. To solve this game, a novel learning algorithm based on Bayesian updating is proposed to find a ϵ-like psychological self-confirming equilibrium of the game. Simulation results show that, based on the error-free beliefs on the attacker's psychological strategies and beliefs, the soldier's material payoff can be improved by up to 15.11% compared to a conventional dynamic game without psychological considerations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Emerging Internet of Things (IoT) technologies have led to significant changes in how autonomous systems are managed [1]. In a military environment, IoT technologies provide new ways for managing and operating a battlefield by interconnecting combat equipment, soldier devices, and other battlefield resources[2]. This integration of the IoT with military networks is referred to as the Internet of Battlefield Things (IoBT)[1]. In an IoBT, the connectivity between the wearables carried by the soldiers and other IoBT devices, such as multipurpose sensors, autonomous vehicles, and drones, plays a significant role in the mission-critical battlefield operations [3]. However, the connectivity between these devices is highly vulnerable to cyber attacks, given the the adversarial nature of the battlefield coupled with the limitations of the IoBT devices’ security mechanisms [4]. Moreover, in an adversarial battlefield environment, the psychology of the soldiers and attackers could significantly influence their behavior, and, subsequently influence the security of the IoBT network.

I-a Related Works

The existing literature has studied a number of problems related to the security of the IoBT[2, 3, 4, 5]. In [2], the communications and information management challenges of the IoBT are investigated. The work in[3] integrates IoT and network centric warfare for the enhancement of the IoBT integrity. The authors in [4] use a feedback Stackelberg solution to dynamically optimize the connectivity of an adversarial IoBT network. The work in [5] develops a mean-field game approach to analyze the spread of misinformation in an adversarial IoBT. Despite the promising results, these existing works[2, 3, 4] mostly rely on static constructs and do not consider the influence of the human players’ psychology and potential bounded rationality when making decisions or choosing strategies within an IoBT setting. Indeed, the behavioral aspect of human decision making processes, leading agents to deviate from the fully rational objective behavior in an IoBT, has a direct impact on the security of the IoBT network. Hence, this aspect must be accounted for and thoroughly studied within the context of studying and assessing the security of the IoBT.

Recently, there has been significant interest in studying human behavior and its cyber-psychical security impact. The authors in [6] study a common-pool resource game that captures the players’ risk preference using tools from prospect theory. The work in [7] uses prospect theory to study the effect of a defender’s and attacker’s subjective behavior on the security of a drone delivery system. The work in [8] uses prospect theory to analyze the interaction between the defender of a cloud storage system and an attacker targeting the system with advanced persistent threats. In [9], a cognitive hierarchy theory based approach is proposed to capture the bounded rationality of defenders and attackers in cyber-physical systems. These previous works present interesting and novel results. However, the existing literature has not yet considered and analyzed the influence of players’ psychology on the game-theoretic decision making in IoT networks. In fact, recent works in the game theory literature have shown that decision making is strongly impacted by human psychology and have studied various games’ aspects and solutions while accounting for psychological factors[10, 11, 12, 13]. In this regard, the work in [10] proves the existence of sub-game perfect and sequential equilibria in psychological games. The authors in [11] study a game-theoretic model that captures dynamic psychological effects and develops new psychological game solution concepts. The work in [12] considers the behavioral consequences of psychology in presence of blaming behaviors. In addition, the effect of the human psychology in mean-field-type games is studied in [13]. Despite the promising results, these existing works on psychological game theory and its applications[10, 11, 12, 13] have not analyzed the potential adoption of psychological game approaches in security scenarios. In [14], we studied how a soldier’s and an attacker’s psychology can impact an IoBT network’s security. However, in [14], the players’ resource limitations and IoBT connectivity objectives are not considered. In addition, in [14], the soldier’s actions at each step in the battlefield reveal the soldier’s preference on its future actions, as such the psychological forward induction of [14] can be used to solve the proposed security problem. Yet, in a real battlefield, the soldier’ actions can be rather independent at each time step, making the psychological forward induction based solution of [14] infeasible. Thus, there is a need to introduce new solutions that dynamically predict and react to the actions of adversaries in the battlefield, while taking the players’ resource limitation and IoBT connectivity objectives into consideration.

I-B Contributions

The main contribution of this paper is to analyze the psychological behavior of human decision makers in an adversarial IoBT network, in presence of stringent resource limitations (i.e. time limitation and power limitation) for the players. To our best knowledge, this is the first work that jointly considers players’ resource limitation and their psychological behavior for securing an IoBT network. Our key contributions include:

  • We develop a novel framework to dynamically optimize the connectivity between a soldier and the IoBT network. We consider a battlefield in which a soldier must accomplish a time-critical mission that requires traversing the battlefield while maintaining connectivity with the IoBT network. Meanwhile, the attacker in the battlefield is interested in compromising the soldier’s IoBT connectivity, by selectively jamming the IoBT network at each time instant in the battlefield. The solider, acting as a defender, will selectively connect to certain IoBT devices at each time instant along its mission path, so as to evade the attack.

  • We formulate this IoBT security problem as a dynamic game, in which the soldier attempts to predict and evade the attacker’s attack at each time instant in the battlefield to minimize its cumulative expected retransmission delay, while the attacker aims at optimally targeting the IoBT devices to maximize the soldier’s retransmission delay while accounting for its limited cumulative jamming power. Both the soldier’s time limitation and the attacker’s power limitation are considered in the formulated game. In this regard, we prove the uniqueness of the Nash equilibrium (NE) of this game, under a set of defined conditions, and we study the resulting NE strategies, which allows analysis of the optimal decision making processes of the soldier and attacker based on their built set of beliefs over the strategy on their opponent’s strategies.

  • We perform fundamental analysis on the soldier’s and attacker’s psychology in the battlefield using the framework of psychological game theory[11]. In the formulated psychological game, the psychology of the players (i.e. the soldier and the attacker) is modeled as their intention to frustrate each other. The frustration of the players is quantified as the gap, if positive, between their expected payoff and actual payoff. A psychological equilibrium (PE) is used to solve the psychological IoBT game. In this regard, we prove the uniqueness of the PE for our proposed psychological game, under the same set of conditions at which the NE is unique. In addition, our analytical results show that, in an attempt to frustrate the soldier, at the PE, the attacker is more prone to attack the IoBT device with the best channel conditions.

  • We propose a Bayesian updating algorithm to establish the players’ belief system, so as to solve the proposed psychological IoBT game. In this regard, the algorithm characterizes what is known as an -like psychological self-confirming equilibrium (PSCE) of our proposed psychological game.

  • The results also show that, based on its error-free beliefs on the attacker’s psychological strategies and beliefs, the soldier can obtain an up to gains in its expected material payoff at equilibrium, compared to a conventional dynamic game. Meanwhile, using Bayesian updating, the soldier and the attacker can achieve -like beliefs, such that an -like self-confirming psychological equilibrium of the formulated psychological game can be reached. Simulation results also show that, the non-error-free beliefs, which result from, for example, iterations in the Bayesian updating algorithm, can yield up to loss in terms of the soldier’s expected material payoff.

The rest of this paper is organized as follows. The system model and problem formulation are described in Section II. The psychological analysis of the soldier and attacker is represented in Section III. The Bayesian updating-based solution of the psychological IoBT game is proposed in Section IV. In Section V, simulation results are presented and analyzed. Finally, conclusions are drawn in Section VI.

Ii System Model and Problem Formulation

Consider a battlefield in which a soldier seeks to move from an origin to a destination along a predefined path, using a minimum amount of time as shown in Fig. 1. At the same time, this soldier tries to communicate with a total of IoBT devices in a set that is uniformly deployed along this path, to get access to situational awareness within the battlefield and to receive instructions from the battlefield commander. The soldier can only associate with one IoBT device at each location. The soldier should communicate with IoBT devices along the path, so as to maintain its total downlink transmission delay lower than , while getting access to the required information on time. Meanwhile, in this battlefield, an attacker seeks to disrupt the connectivity between the soldier and the IoBT devices by jamming the communication links. Given the limitation on its total power , the attacker can only compromise (i.e. jam) the IoBT network at most times along the path. At each step in this battlefield, the soldier and attacker will sequentially choose strategies to realize their objective, based on their perfect observation on what happened in the battlefield. Here, the soldier’s objective is minimizing its communication delay, the attacker’s objective is maximizing the soldier’s communication delay, while minimizing its total power consumption.

Fig. 1: Soldier battlefield security graph.

Ii-a Soldier’s communication delay

We assume that the soldier (attacker) chooses to connect with (jam) the IoBT network at each step, sequentially, until the soldier arrives at . The soldier communicates with each IoBT device over a downlink channel . The signal-to-interference-plus-noise ratio (SINR) of the downlink channel between the soldier and IoBT device is given by:

(1)

where and are, respectively, the transmit powers of soldier and the attacker. is the path loss between the soldier and IoBT device , with being the Rayleigh fading parameter, being the distance between the soldier and IoBT device , and the path loss exponent. is the path loss between the attacker and the IoBT device , with being the Rayleigh fading parameter, and being the distance between the attacker and the IoBT device . is the power of the Gaussian noise. At each step

, based on the probability distribution of the Rayleigh fading parameter

and , the probability that the soldier’s received SINR, , is higher than a threshold, , in one time slot is given by:

(2)

where , . Here, and

are the probability density functions of the Rayleigh fading parameters

and , respectively. In the studied battlefield, the soldier attempts to maintain a probability of achieving an SINR exceeding , , that is higher than a threshold . Hence, the soldier will request retransmissions of the downlink data from IoBT device . However, the soldier will perform retransmissions, in the case that the channel is occasionally experiencing a small scale fading. Thus, is given by:

(3)

under the goal of maintaining . Thus, the retransmission delay of the soldier at each step is given by . Here, is the average unit transmission delay, which is the average duration of a successful packet transmission at the physical medium of one resource block, at step [15]. is the size of one resource block, is the bandwidth of channel .

Ii-B Strategies of the players

In the studied battlefield, the objective of the soldier is to effectively maintain a low transmission delay. Thus, the soldier will attempt to communicate with the IoBT devices that will not be attacked. represents the soldier’s action space at each step . Here, at each step , indicates that the soldier builds a communication link with IoBT device , whereas indicates that the soldier does not communicate with IoBT device .

On the other hand, the objective of the attacker is to increase the retransmission delay of the soldier. As such, under a constraint on its total power consumption, the attacker will find the best time instant to launch an attack on the IoBT network, so as to decrease the SINR of the communication channel between the soldier and IoBT network. The attacker’s set of the possible actions at each step can be represented by . Here, action indicates that the attacker chooses to compromise the IoBT network, while action indicates that the attacker does not jam the IoBT network. Note that the jamming power, , that will be used by the attacker is assumed to be constant.

In addition, we use to represent the sequence of actions that have been taken by each of the players before reaching step . We refer to as the history at step . In this respect, the set of all possible histories at step is denoted by . In addition, we let denote the sequence of actions that have been taken by each player up to step , including the action pair taken at step . After observing history at step , the soldier and attacker will find the optimal strategies at the current step to realize their individual objectives. The set of soldier’s feasible actions at history is, then, represented by , while the set of the attacker’s feasible actions at history is represented by . The actions that are chosen by the soldier and the attacker at history are represented, respectively, by and . In addition, the set of possible terminal histories , at which point the soldier reaches , is represented by , where .

In an adversarial IoBT environment, the soldier will randomize its action selection at each history such as to make it more complex for the attacker to guess the IoBT device to which the soldier aims to connect. The soldier will, hence, choose a probability distribution over its feasible action set at history . In this regard, denotes the probability of choosing action at history , where . This probability distribution denotes the soldier’s mixed strategy at history . A possible strategy for the soldier in the battlefield can, then, be represented by a set . The set of all feasible strategies of the soldier is denoted by .

A similar randomization logic is used by the attacker. The attacker seeks to choose a probability distribution over its feasible action set at each history , so as to maximize the soldier’s transmission delay while keeping its consumed jamming power at a minimum. In this respect, corresponds to the probability of choosing action at history , where . This probability distribution is the attacker’s mixed strategy at history . A possible strategy for the attacker can, hence, be denoted by a set . The set of all possible strategies of the attacker is denoted by .

Ii-C Material payoff

We define the soldier’s material payoff as the normalized gap between the sum of the soldier’s actual communication delay and the soldier’s maximum tolerable communication delay. Meanwhile, we define the attacker’s material payoff as the weighted sum of the soldier’s time delay and the attacker’s power consumption.

Note that, in (2), depends on both the soldier and attacker’s actions and in the form:

(4)

where is an indicator function that only equals to when the current action of the attacker is . Hence, the soldier’s time delay, when attempting to communicate with at history , is given by:

(5)

In case the soldier does not communicate with the IoBT device at history , the soldier will naturally not incur any delay which leads to . As such, the soldier’s retransmission delay is a function of the soldier’s and attacker’s actions. At the terminal history , the soldier’s accumulated communication delay will be:

(6)

where and represent, respectively, the soldier’s and attacker’s action at step in . Note that, under each terminal history , and . Here, we note that, even though not communicating with any device will lead to a minimum delay for the soldier, this is not a feasible strategy for the soldier, since by definition, the soldier has to communicate with devices in the battlefield so as to acquire situational awareness. Based on its primary objective, the soldier will determine an optimal strategy that minimize its expected total time delay. This, hence, requires maximizing the normalized gap between the cumulative retransmission delay and the maximum tolerable delay, which is defined as :

(7)

where is the probability of occurrence of terminal history , and is induced by the soldier’s and the attacker’s mixed-strategies, and as follow:

(8)

where history is part of . In other words, represents the sequence of actions in taken before . Hence, represents the soldier’s expected utility (or, equivalently, expected material payoff) achieved under the strategy pair .

Meanwhile, the material payoff of the attacker at the terminal history is defined as

(9)

where and represent, respectively, the weight of time delay and power consumption, with . Thus, in this battlefield, the attacker will select the optimal strategy that maximizes the soldier’s time delay111Here, we assume that can be learnt by the attacker using its knowledge of the IoBT devices’ quantity and channel condition, or through, for example, a prior reconnaissance phase about the soldier and its objectives., while minimizing its power consumption, which can be captured by maximizing the following expected utility (i.e. expected material payoff):

(10)

In this battlefield, the attacker can track the soldier’s location via GPS, and it can gather intelligence (i.e., knowledge) on the soldier’s associated objective. However, it does not know the IoBT devices to which the soldier will connect. Meanwhile, the soldier knows that the attacker is present, but does not know which IoBT devices it will target. Then, to determine their optimal actions at each history, the soldier and attacker aim at forming an estimation of their opponent’s actions (e.g. the attacker estimates the soldier’s actions, and the soldier estimates the attacker’s actions). This estimation is defined as the soldier and attacker’s

first-order beliefs on each other. Let

be the soldier’s vector of beliefs on the probability distribution of the attacker’s actions

and at history , and let be the attacker’s belief vector on the probability distribution of the soldier’s actions and at history , respectively. As such, we let and , denote the set of first-order beliefs of, respectively, the soldier and attacker for each possible history at step . Hereinafter, we use to denote a set of soldier’s first-order beliefs on the attacker, and to denote a set of attacker’s first-order beliefs on the soldier, at each possible history.

Based on belief , the soldier’s perceived (i.e. belief-based) expected material payoff will be given by:

(11)

where is the belief-based probability of occurrence of the terminal history induced by and :

(12)

Similarly, given its first-order belief , the attacker’s perceived (i.e. belief-based) expected material payoff under strategy will be:

(13)

where is the belief-based probability of occurrence of the terminal history induced by and :

(14)

Ii-D Game formulation

In the studied IoBT scenario, the primary objective of the soldier is to find a strategy to effectively evade the jamming attack of the attacker, while the objective of the attacker is to find an attack strategy that effectively jams the soldier’s communication with the IoBT devices. As such, we formulate a dynamic game to capture the dependence between the objectives and the actions of the soldier and the attacker. Here, is the set of players which includes the soldier and attacker. is the set of histories representing the sequence of actions that have been taken by each of the players before reaching a certain stage, and represents the set of terminal histories, at which point the soldier reaches its destination, , and the game ends. and are the expected utilities of the soldier and the attacker, respectively, defined in (7) and (10), while and are their belief-dependent (i.e. perceived) expected utilities, defined in (11) and (13). In this formulated game, each of the soldier and the attacker aim at maximizing their (belief-based) expected utilities. When the beliefs of each player accurately predict the strategy of the opponent, and when each player chooses a strategy that maximizes its expected utility based on those beliefs, these strategies give rise to a Nash equilibrium (NE) for the proposed game, which is formally defined as follows:

Definition 1.

A Nash equilibrium (NE) for the formulated dynamic game is defined as , in which , and are rational, such that:

(15)
(16)

while beliefs and are error-free such that for all at each history in the game:

(17)

and for all at each history in the game:

(18)

Thus, at an NE of the proposed game, both the soldier and attacker correctly estimate their opponents’ strategies (represented by an error-free set of beliefs over the opponent’s strategy) and make rational determinations on their strategies based on their error-free beliefs, at every history of the game. As shown in Definition 1, the rational strategies of the players (i.e. the soldier and the attacker) are the strategies that maximize the players’ error-free belief-based perceived expected payoff, and . At each history , the players’ error-free first-order beliefs (i.e. and ) on each of their opponents’ feasible action (i.e. , ) equals the probability that their opponents choose this action with their rational strategies (i.e. and ). The players, including the soldier and the attacker, are considered to hold accurate (error-free) beliefs in the computation of their respective NE strategies. Hence, these error-free beliefs require that, at equilibrium, beliefs should accurately predict the opponent’s strategy. However, in practical networks, the players’ beliefs may not be fully accurate, when no effective prediction method is used. Hence, when solving (15) and (16), the resulting soldier and attacker strategies are rational (i.e. optimal), but are based on their respective beliefs. If these beliefs are not accurate (i.e. if (17) and (18) are not met), even through each of the players is still acting rationally, their strategies may deviate from the NE strategies.

Moreover, in practice, as emotional human players, the soldier and the attacker may also deviate from their NE strategies [16]. In this case, despite being theoretically valid, the error-free beliefs, and defined in Definition 1 may not be consistent with the players’ actual emotional strategies. As such, the rational strategies, and , that maximize the players’ -based and -based perceived expected payoffs, may not maximize the players real expected payoffs when the opponent deviates from its fully rational strategies, due to behavioral factors [11]. In addition, the soldier’s and attacker’s subjective emotions may also modify their objective functions to incorporate additional subjective goals. These psychological facets of the player’s behavior in an IoBT network, are studied next using the framework of a dynamic psychological game [11].

Iii Dynamic Psychological Game

The formulated dynamic game in Section II captures the primary objectives of the soldier and attacker and the interdependence between these objectives. In this respect, in this game, each player, by using a set of beliefs about the opponent’s strategy, aims at computing its optimal strategy to maximize its respective expected utility. Hence, the beliefs are considered to be solely a means using which a player can estimate its opponent’s strategy in order to choose its optimal strategy, but are not considered a part of the utility function of each player. However, given the psychological (i.e. human) nature of the players in our game, their expectations, beliefs, and emotions have a direct effect on the way they perceive the outcome of the game. Indeed, by not achieving their expected (or belief-based) payoff, the soldier or attacker will experience frustration or anger, which has a direct impact on the way they assess and perceive the outcome of the game. In addition, due to the adversarial nature of the relationship between the soldier and attacker, in addition to achieving their own objective by maximizing/minimizing the communication delay, each may also strive to intentionally hurt the opponent, by aiming at frustrating the opponent or, more generally, causing a psychological (i.e. emotional) damage to this opponent. Hence, incorporating this psychological aspect in the formulation of the utility functions of each player enables a more general and representative game analysis that can realistically capture the psychological decision making processes and behavior of each of the soldier and attacker.

To this end, we next incorporate notions from psychological games [11] in our game formulation to capture and analyze this psychological aspect of the decision making processes of the attacker and soldier. As such, in our introduced psychological game, the players expectations and beliefs will now be directly incorporated in their utility functions. In addition, given their objective to frustrate and anger the opponent, each player aims at anticipating the payoff that the opponent expects. To this end, in addition to building beliefs over the opponent strategies, each player also aims at building a belief system over the opponent’s beliefs. This would, hence, enable anticipating the expectations of the opponent and, as a result, maximize its frustration.

Iii-a Psychology in the battlefield

In the aforementioned IoBT scenario, when one player (i.e. the soldier or the attacker) chooses its strategy such that its opponent receives a material payoff lower than expected, this player successfully frustrates its opponent. For example, if the soldier believes that the attacker did not launch an attack on the IoBT network at step , it will communicate with this IoBT device and expect to achieve a material payoff . If, in reality, the attacker attacked , then the material payoff of the soldier will decrease to . The gap between and quantifies the soldier’s frustration. Note that the soldier and attacker only feel frustrated when they get a lower material payoff, compared to their expected material payoff. Hence, in our proposed psychological game formulation, the soldier and attacker will intentionally attempt to frustrate each other while also seeking to achieve their own, individual objectives. Ultimately, the soldier and attacker’s intention to frustrate each other, combined with their individual objectives (i.e. to minimize or maximize the soldier’s communication delay), will determine the soldier and attacker’s strategies in the battlefield.

To consider their opponents’ frustration in their own payoffs, the soldier and attacker should estimate their opponent’s expected payoffs. This estimation requires the soldier and the attacker to build beliefs about their opponent’s first-order beliefs, i.e. to build second-order beliefs. The soldier’s second-order belief on the attacker’s first-order belief at history , , is denoted by a vector . The attacker’s second-order belief on the soldier’s belief is denoted by . Hereinafter, we use to denote the set of soldier’s second-order beliefs on the attacker, and to denote the set of attacker’s second-order beliefs on the soldier, for each possible history.

Iii-B Soldier and attacker’s frustration

We define the soldier’s and attacker’s frustration as the gap between their expected material payoffs, respectively defined in (11) and (13), and their actual material payoffs. This frustration, indeed, stems from the fact that the soldier (attacker) may choose an action () that may be different from what the attacker (soldier) has anticipated based on its belief (). Thus, in the considered IoBT network, under terminal history , the soldier’s frustration with strategy and belief will be given by (given that the soldier aims at maximizing defined in (11)):

(19)

where . is the soldier’s actual payoff under terminal history . Note that the attacker has no knowledge of the soldier’s first-order belief, , and strategy, . Hence, based on its sets of first-order and second-order beliefs, and , on the soldier’s strategy and first-order belief , the attacker can form a belief-based perception of the soldier’s frustration, denoted by , when a terminal history occurs, is expressed as follows:

(20)

where . In addition, is the attacker’s perceived belief-based probability of occurrence of the terminal histories induced by its first-order and second-order beliefs, and :

(21)

Thus, combining the attacker’s primary objective of maximizing the soldier’s communication delay at a minimum needed total jamming power with its intention to frustrate the soldier results in the following belief-based expected psychological payoff (i.e. belief-based expected psychological utility):

(22)

where is a parameters that represents the attacker’s motivation and willingness to frustrate the soldier.

Similarly, under history , the frustration of the attacker with strategy , under the first-order belief , will be:

(23)

where is the attacker’s actual payoff at terminal history, as defined in (9). Based on its first-order and second-order beliefs, and , the soldier can form a belief-based perception of qualify the attacker’s frustration, denoted by , when terminal history is achieved, as follows:

(24)

where . Here, is the perceived belief-based probability of occurrence of terminal history based on the soldier’s first-order and second-order beliefs, and , and is defined as:

(25)

Then, the soldier’s goal to minimize its expected communication delay combined with its intention to frustrate the attacker can be captured by the following belief-based expected psychological payoff (i.e. belief-based expected psychological utility):

(26)

where is a parameter that represents the soldier’s motivation to frustrate the attacker.

Iii-C Dynamic Psychological game

To capture the decision making processes of of the soldier and attacker, we introduce a dynamic psychological game , where, similarly to the dynamic game defined in Section II-E, is the set of players including the soldier and attacker. is the set of histories, and is the set of terminal histories. In addition, and represent the soldier’s and the attacker’s psychological expected utility defined in (26) and (22), respectively. In this psychological game, the soldier and the attacker aim at maximizing their belief-based psychological expected utilities. In this regard, when the first-order and second-order beliefs of each player correctly predict the strategy and the first-order belief of the opponent, and when each player chooses a strategy that maximizes its belief-based psychological expected utility based on those correct beliefs, these strategies give rise to a psychological equilibrium (PE) [11]. In this respect, the PE of our proposed psychological game is formally defined as follows:

Definition 2.

The psychological equilibrium of the formulated psychological game is defined as , in which , and are rational, such that:

(27)
(28)

while the first-order and second-order beliefs, , , , and , are error-free such that for all at each history :

(29)

and for all at each history :

(30)

The principal difference between the PE and the NE (which is an underlying difference between the proposed standard dynamic game and the proposed psychological game) is that the utility function of each player is not only dependent on the strategy or action chosen by the opponent, but also on the opponent’s beliefs. In this regard, the payoff of each player in the psychological game does not only depend on what the opponent does, but also on what the opponent thinks. This enlarges the domain of analysis of the game to incorporate psychological aspects of the players’ decision making processes, which are not typically present in a traditional dynamic game formulation. Hence, even though the definition of the PE still requires that the first-order and second-order beliefs of each player are error-free, since these beliefs are incorporated in the payoffs of each player, they will have a direct effect on their rationally chosen (i.e. PE) strategies. In essence, at a PE, the players’ intention to frustrate one another is captured, as the soldier and the attacker make rational determination on their strategies to maximize both their belief-based expected material payoff and their opponents’ frustration, based on their error-free first-order and second-order beliefs. Based on [11], there always exists at least one such PE in the formulated psychological game. In particular, under the assumptions that i) evaded attacks at history yield higher expected material payoffs for the soldier and lower expected material payoffs for the attacker, at the current and future histories, and ii) a successful (unjammed) communication at history yields a higher expected material payoff for the soldier and a lower expected material payoff for the attacker, at the current and future histories, we can derive Theorem 1 and Theorem 2:

Theorem 1.

The NE and the PE of, respectively, the conventional dynamic game and the psychological game are unique.

Proof.

At history , the soldier chooses its action from , while the attacker chooses its action from . We represent the soldier’s and attacker’s payoffs when the soldier takes action and the attacker takes action , where , by and , respectively. Here, and include the instantaneous payoffs the soldier and attacker receive when taking their action pair at history as well as future expected payoffs at the following histories. As such, under each combination of the soldier’s and attacker’s pure strategies at , the soldier’s possible payoffs are represented by , , and , while the attacker’s payoffs are represented by , , and . Note that, here, we consider that the following inequalities hold: , and . Indeed, reflects the gain that the soldier receives from communicating with the IoBT network without being jammed by the attacker, while reflects the loss the attacker incurs from attempting to communicate with a jammed IoBT network. In addition, reflects the gain the soldier will receive in future steps due to the attacker wasting some of its jamming power when the soldier had not attempted to communicate with the IoBT network. Similarly, we also consider the following inequalities to hold: , . These inequalities correspond to considering that: i) evaded attacks at history yield higher expected material payoffs for the soldier and lower expected material payoffs for the attacker, at the current and future histories, and ii) a successful (unjammed) communication at history yields a higher expected material payoff for the soldier and a lower expected material payoff for the attacker, at the current and future histories. In this respect, we can compute the psychological payoff of the soldier and the attacker under each combination of these pure strategies. In this regard, we consider to be the attacker’s belief representing the probability with which the attacker believes that the soldier will choose action . In addition, we consider to be the belief that the soldier has, representing the probability with which the soldier believes that the attacker will choose action . Under the correctness of beliefs defined in (29) and (30) of the PE, these probabilities also reflect the second-order beliefs of the players as well as the actual strategies chosen by each of the players. Starting from the soldier’s side, the soldier’s psychological payoffs at the strategy pairs and are, respectively, and . In addition, psychological payoffs of the soldier at the pure strategy pairs and are, respectively, and . On the other hand, the attacker’s psychological payoffs at strategy pairs and are, respectively, and . In addition, the attacker’s psychological payoffs at strategy pairs and are, respectively, and .

In the conventional dynamic game, in which the frustrations of the soldier and attacker are not considered in their opponent’s utility functions, we denote the soldier’s and attacker’s strategies by and , respectively. By using the indifference principle, we can compute the NE strategy of the soldier, which results in , and the NE strategy of the attacker, which results in . Here, we note that this computed NE is unique since it can be shown that no NE exists in pure strategies, under our considered set of inequalities defined at the start of the proof, and the solution of the equations resulting from the indifference principle results in unique mixed-strategies and .

Now, considering the psychological game, we denote the soldier’s and the attacker’s strategies at the PE by and , respectively. By using the indifference principle, with the soldier and the attacker holding correct (i.e. error-free) beliefs on one another, we can compute the PE strategies as follows:

(31)
(32)

where , . Note that, in (31), when , , when , . At the same time, in (32), when , , when , . Hereinafter, we rewrite (31) as:

(33)

where , , , . Meanwhile, in (33), . Note that, when , we can prove that , which implies that, in (31), increases with an increase in .

Meanwhile, we rewrite equation (32) with:

(34)

where , , , . In (34), . Here,