Game of Duplicity: A Proactive Automated Defense Mechanism by Deception Design

by   Linan Huang, et al.
NYU college

We present a new game framework called the duplicity game to design defensive deception mechanisms. The mechanism provides the defender with a proactive and automated way to enhance security by defensive deception. Unlike encryption which hides a secret, our mechanism achieves deception overtly; i.e., the user recognizes the potential deception, yet still has the incentive to participate in the mechanism and follow the defender's regulation. The deception mechanism consists of the joint design of a policy generator, an incentive manipulator, and a trust modulator. A case study of discriminative honeypot configuration is presented to validate the deception mechanism design.



page 1

page 2

page 3

page 4


Web3 Meets Behavioral Economics: An Example of Profitable Crypto Lottery Mechanism Design

We are often faced with a non-trivial task of designing incentive mechan...

Internet Appendix for "Sequential Bargaining Based Incentive Mechanism for Collaborative Internet Access"

This document is an Internet Appendix of paper entitled "Sequential Barg...

Foundations of Transaction Fee Mechanism Design

In blockchains such as Bitcoin and Ethereum, users compete in a transact...

Two-player incentive compatible mechanisms are affine maximizers

In mechanism design, for a given type space, there may be incentive comp...

Mechanism Design without Money for Common Goods

We initiate the study of mechanism design without money for common goods...

FedServing: A Federated Prediction Serving Framework Based on Incentive Mechanism

Data holders, such as mobile apps, hospitals and banks, are capable of t...

Phish Phinder: A Game Design Approach to Enhance User Confidence in Mitigating Phishing Attacks

Phishing is an especially challenging cyber security threat as it does n...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Cyber deception, which has been widely used by attackers for adversarial purposes [1], has recently been adopted by defenders to enhance cybersecurity[2]. Defensive deception technologies, such as moving target defense [3] and honeypots [4], enable a more proactive security posture by deceiving the attackers to act in favor of the defender. Although case-by-case deception designs have been actively proposed in various security scenarios under diverse constraints [4], a unified and quantitative paradigm to understand and design automated defensive deception is lacking and is the main goal of this work.

To build the paradigm, we abstract the following commonalities that underlie various forms of deception scenarios. The defender as the deceiver has access to the system’s private information which is unknown or uncertain to the user. The defender presents manipulated information to the user by twisting, fabricating, or hiding the private information. The user, whose incentive is to maximize his utility, acts based on the received information and his trust of the information. The defender’s deception goal is two-folded. The first one is to establish trust in the user and the second one is to mislead the user’s action to align with the defender’s anticipation to the greatest extent possible. Three challenges arise from the design of defense mechanism to achieve the deception goal. First, since the defender’s and the user’s utilities are generally not aligned under the same scenario, it is essential to achieve the tradeoff between establishing trust and misleading actions. Second, the user can be distinguished into heterogeneous types based on their targets, resources, and the initial trust level. The defense mechanism needs to consider the user’s type which affects the user’s behaviors under the same manipulated information. Third, the defense mechanism subjects to various constraints that arise from system implementation, capacity, and standards. For example, a honeypot can only exhibit vulnerabilities that are compatible with the system it emulates.

Leveraging the tools from game theory, we propose the

game of duplicity as a paradigm to address these challenges. The duplicity game is a two-stage Bayesian game between a defender and a user who has heterogeneous types unknown to the defender. For example, the user can be either legitimate or adversarial. At the first stage, the defender can design a defenise deception mechanism that consists of three essential components, i.e., a policy generator, a trust manipulator, and an incentive modulator. At the second stage, the user observes the defender’s security policy, updates his initial trust through the Bayesian rule, and then uses the network based on his type and the updated trust.

The generator is a mechanism which automatically generates a security policy based on the system’s private information and systemic constraints. For example, the location of honeypots in a computer network is the private information that is only known by the defender. The policy generator in this scenario refers to the honeypot configuration. Under different configurations, the user observes different sets of features regarding protocols, TCP/IP fingerprints, ports, and the response time [5]. These features affect the user’s judgment of whether a node is a honeypot or not, and elicit different behaviors of the user. Thus, each set of features can be viewed as the defender’s security policy which regulates the user’s behavior. In contrast to the generator which achieves one-shot deception by imposing a security policy, the trust manipulator aims to distort the user’s initial belief of the private information gradually through persistent interactions with the user [6]. For example, the defender can reduce the attacker’s alertness by maintaining a low percentage of honeypot in the long run. Then, the defender receives a high capture rate of attackers in the honeypot when he increases the percentage occasionally. Finally, the incentive modulator reshapes the incentive structures of the players by designing constrained utility transfers between two players to align the user’s incentive with the defender’s. For example, the defender can prolong the authentication time intentionally to decrease the user’s utility of accessing the honeypot. These three components of the defensive deception mechanism can be designed collectively or independently and empower the defender to harden the security without losing the user’s trust.

Our defensive deception mechanism has three distinctive features. First, unlike encryption which hides a secret, the mechanism achieves deception overtly; i.e., the user accepts the security policy voluntarily as a token of acknowledgment of potential deception. Second, the mechanism is proactive as the defender designs deceptive security policies and anticipates the user’s trust update toward the deception. Third, the mechanism is automated and discriminative as the security policies are generated automatically from the generator and distinguish different types of users by eliciting different behaviors.

We first analyze the duplicity game through the lens of mathematical programming. The prime version quantifies the feasibility and design capacity of the defender’s deception mechanism. On the other hand, the dual version provides an alternative interpretation of the deception design problem as a pricing problem for security as a service (SECaaS). Second, we use the concavification technique [7, 8] to graphically analyze the defender’s joint design of the generator, modulator, and manipulator. We analyze both the regulation limitation of all feasible generators and the regulation efficiency of the optimal generator. From the user’s side, we show how the utility alignment of two types of users affects the separability of their actions. From the defender’s side, we show how the level of the user’s maliciousness affects the defender’s capacity to regulate the user’s action. We find that the user’s level of maliciousness has a threshold impact on the regulatability and the threshold is , i.e., it is the sign rather than the exact value of the maliciousness level that affects the regulatability. Finally, we include the joint design of the incentive modulator and the trust manipulator into the generator design, which results in two insights for the deception mechanism design. First, the modulator can be designed independently from the other two without loss of generality. Second, the designs of the policy generator and the trust manipulator boil down to the local and the global initial belief manipulation, respectively.

I-a Notations

Calligraphic letter defines a set and represents its cardinality. Bold letter

represents a probability vector, i.e.,

. When , we only need one element to determine . With a little abuse of notation, we use to represent when . The defender obtains an ex-ante utility if the user takes action based on his prior belief. The defender obtains an ex-post utility after she implements the deception mechanism to make the user take action based on his posterior belief.

Ii Related Work

Game theory has been widely applied to study proactive and autonomous defense to enhance cybersecurity [9, 10, 11]. In particular, evidence-based signaling games [12, 13], dynamic Bayesian games [14, 15], stackelberg security games [16] and partially observable stochastic games [17] have been adopted to study signaling and deception. These incomplete-information games focus on finding the signals and behaviors at the equilibrium under a given mechanism. Restricted by the applications, the signaling mechanism itself is not designable and can result in undesired equilibria. In this work, the designable mechanism empowers the defender to create additional information advantages besides exploiting the existing information asymmetry.

Previous works of security-enhancement mechanisms focus on designing the payoff and allocation rules to incentivize participants’ behaviors [18, 19]. Our duplicity game further incorporates the design of information to incentivize the behaviors and can be viewed as a generalized class of Bayesian persuasion games [8, 20] with heterogeneous receivers and double-sided asymmetric information. Introducing types for parametrization and differentiation exists in the literature. Types have been used to model the user’s (i.e., the deceivee’s) endogenous evidence [21] or prior belief [22, 23] of the state, which directly affects the signaling mechanism. In [24], the deceiver can send separate signals to each type of receivers and in [25], an additional mechanism is introduced to make the deceivee truthfully report his private type. Comparing to these works, we adopt deception to achieve a different design goal of a proactive automated defense mechanism which elicits different behaviors from heterogeneous users to maximize the defender’s utility on average.

Iii Duplicity Game Model

The game of duplicity consists of four elements; i.e., the basic game , the belief statistics , the information structure , and the utility transfer . The deception is overt as , and the elements of the basic game and the utility transfer are all common knowledge of the game. In Example 1, we illustrate how our duplicity game can help the defender to configure the honeypot to discriminate the behaviors of legitimate and adversarial users and reduce the false positive rate automatically.

Iii-a Game Elements

The basic game consists of two players , a defender (hereafter she) and a user (hereafter he). Define the finite sets of state , type , and action where , , and are the number of possible states, types, and actions, respectively. The defender has a private access to the value of the state . The user has a private type and chooses an action . The user can refuse to participate in the game and take no actions, which is regarded as a feasible action denoted by . The utilities of the defender and the user depend on the values of the state, type, and action; i.e., .

The belief statistics consists of both players’ prior beliefs of the state or the type. After observing the state value with probability , the defender presumes that the user’s type is with probability . The user of type presumes that the state is with probability . Thus, , , are all valid probability measures for each and . The user’s perceived state distribution can be different from the true state distribution and he may not know the true state distribution. Since the defender can affect the belief statistics of the state gradually through persistent interactions with the user, we assume that a virtual trust manipulator can control the distributions and to the desired values directly and instantly at the beginning of the game.

The information structure consists of a finite set of security policies and a policy generator . The defender with state observation determines the generator . Then, a security policy

is imposed automatically and randomly according to the probability distribution

. In the honeypot example, the defender configures the honeypot to generate features such as TCP/IP fingerprints and the response time [5] by choosing the interaction levels and the services to emulate. The defender may also have the privilege to disguise a normal server as a honeypot by generating honeypot-related features intentionally [26]. Then, the set consists of all feasible features and the generator refers to the configurations of both honeypots and normal servers.

The utility transfer consists of a scaling factor and an action-dependent incentive modulator which adjusts the utility of the defender and the user to be and , respectively, for all . The defender uses the modulator to incentivize or disincentivize an action .

Definition 1.

An action dominates (resp. is dominated) under type if for all .

Definition 1 defines a special utility structure that one action brings the most benefit for the user of type regardless of the state value. If the defender does not redesign the modulator , the deception mechanism has no influence on the user of type as he takes action consistently for all generators and manipulators.

Fig. 1: The game timeline of the discriminative honeypot configuration which incentivizes adversarial users and disincentivizes legitimate users simultaneously to access the honeypot.

Iii-B Game Timeline and Defender’s Design Problem

The timeline of the two-stage game is illustrated in Fig. 1 through the honeypot scenario in Example 1. At stage one, the defender designs (resp. observes) the distributions , the generator , and the modulator if these components can (resp. cannot) be designed. The defender observes the state value according to the distribution and the generator generates the security policy randomly according to . At stage two, the user receives the security policy and obtains his posterior belief by the Bayesian rule whenever possible111If the denominator equals under , then . ; i.e., ,


Then, the user determines a mixed strategy to maximize his expected utility . Finally, the user takes an action according to his strategy and the defender receives a utility of value . Although the user can adopt mixed strategies, Lemma 1 shows that the defender only needs to consider pure strategies of the user to obtain her maximum utility.

Lemma 1.

The user’s optimal strategy is pure.

The proof to Lemma 1 follows directly from the characteristic of single-player decision problems. The optimal strategy assigns probability to the action , which maximizes the expected utility over the posterior belief in (2), and zero probabilities to all other actions in the set ; i.e.,


Therefore, the user always takes a deterministic action for any security policy from the defender. The defender aims to maximize her post-deception utility


by designing the generator, manipulator, and/or the modulator whenever possible. Define the defender’s optimal value of the deception mechanism as . Different policy generators provide the user with different amounts of information about the state value and two extreme generators are defined in Definition 2.

Definition 2.

The policy generator contains zero information if , and full information if the mapping is injective.

If a generator contains zero information, then after receiving any policy generated by , the user’s posterior belief is the same as the prior belief, i.e., . On the other hand, any policy from a full-information generator provides the user with the state value with probability .

Example 1.

Theoretically, honeypots are assumed the capacity to achieve a zero false-positive rate by generating decoys accessed only by attackers. This assumption can be violated in practice as legitimate users fail to identify all decoys in a large-scale and complex network. In the corporate networks, the security team who implements honeypots does not reveal their locations to network operators due to a lack of communication or prevention of insider threats [27]. Therefore, false alarms from normal user activities cannot be eliminated in production honeypots [28, 29]. Thus, the defender aims to configure the honeypot to incentivize (resp. disincentivize) adversarial (resp. legitimate) users to access the honeypot simultaneously.

In this scenario, the defender determines the percentage of honeypots, i.e., to implement in the corporate network yet releases a public report of the percentage as . We assume that users have no additional information and determine the percentage of honeypots based on the report. Thus, the true percentage is the defender’s private information and the user’s perceived percentage , is common knowledge. Since the user does not know the honeypots’ locations, he does not know the state of each node; i.e., whether the node is a honeypot, i.e., state or a normal server, i.e., state . However, the user knows in ex-ante that the node is a honeypot with probability . The defender does not know whether the user is of legitimate type or adversarial type . However, she can obtain the statistical data of the percentage of legitimate users accessing honeypot nodes (resp. normal nodes ) from public researches such as [30]. Thus, and are common knowledge. The configurations of the honeypot and the normal server affect the probability in which the user observes the set of features . Each set of features observed in each node provides the user the evidence about the state of the node. Thus, the user can use the evidence to update the prior belief of that node’s state and choose his subsequent action correspondingly, such as whether access the node, i.e., action or not, i.e., action . The defender can incentivize (resp. disincentivize ) the user to access the node by providing monetary rewards (resp. prolonging the authentication time intentionally).

Iv Problem Analysis

We analyze the duplicity game through the lens of mathematical programming [31] and concavification [8] in Section IV-A and Section IV-B, respectively. The programming method provides the defender with both a unified method to design the generator, manipulator, and the modulator collectively and a high adaptability for various security scenarios by considering additional constraints. On the other hand, the concavification method provides both an intuitive explanation through graphs and structural results.

Iv-a Mathematical Programming Perspective

We first elaborate on the relationship between the security policy and the user’s optimal action to illustrate the meaning of the security policy. In the honeypot configuration example, the number of feasible features can be huge and even infinite as some feature components such as the response time [5] can be continuous. However, these features can elicit at most action outcomes; i.e., the user’s optimal action is if his type is for all permutations of . Define , as the -th element of the action set

. We can aggregate features based on their elicited action outcomes and classify all features into

mutually exclusive subsets, i.e., . If the user observes , then his optimal action is if his type is . Without loss of generality, we can specify and serves as a security policy which instructs the user of type to take action as his optimal action. In Example 1, there are four possible security policies to regulate the user’s action based on the type; i.e., the legitimate user accesses the node while the attacker does not (denoted by ), the attacker accesses the node while the legitimate user does not (denoted by ), both of them accesses the node (denoted by ), and both of them does not (denoted by ).

Although the user acknowledges the potential deception embedded in the security policy, he still accepts the security policy voluntarily as this is the best he can do under the defender’s information advantage of the state value as shown in (2). Then, the policy generator under each state is a probability distribution over security policies and we can rewrite (2) as follows; i.e., ,


Plug (1) into (4), we formulate the defender’s design of the deception mechanism as the following constrained optimization problem in the prime version (COP), which is proven to be feasible and bounded in Theorem 1.

Denote as the maximizers of problem COP and as the value of the objective function under the maximizers. Constraints (a) and (b) restrict as a valid probability measure. Constraint (c) makes the defender’s deception compatible with the user’s incentives and thus allows the defender to deceive covertly; i.e., the user of type can maximize his benefit by taking , the action from the security policy. Constraint (d) represents a capacity constraint of the utility transfer; i.e., the defender cannot modulate the user’s incentive if the user does not participate in the game. Although we do not restrict the utility transfer function to be bounded for any other actions, Theorem 1 shows that the utility transfer has to remain bounded to optimize COP due to the user’s potential threat of taking the drop-out action . We can also incorporate additional systemic and regulatory constraints. For example, the defender may not have the privilege to disguise a normal server as a honeypot, i.e., is not a decision variable. Restricted by the evolving regulations [32], the defender may only hide information but not generate fake reports, i.e., .

Theorem 1 (Feasibility and Capacity).

The defender’s design problem COP is feasible and bounded. The upper bound of is and the lower bound is where .


Denote , as the optimal action of the user of type under any feasible prior belief and modulator . Then the zero-information generator , is a feasible solution to problem COP.

Since are all probability measures, has upper and lower bounds as and , respectively, when . The generalization of as a decision variable never reduces the value of as , is always a feasible solution. So now we only need to show that the generalization does not increase to infinity. Thus, we can focus on any action , if it exists, where the maximizer has a non-negative value. If , then the drop-out action dominates for all types and is bounded by . On the other hand, if there exists a type where and the defender chooses to overcharge the user for taking action , i.e., , then the user of type will choose the drop-out action . Thus, the defender does not benefit from the overcharge of action and . Combining these two scenarios, we obtain the upper bound of . ∎

Theorem 1 proves the feasibility of the defender’s deception mechanism through an integrated design of the generator, manipulator, and modulator. The upper and lower bounds further provide a design capacity applied to any duplicity games. Corollary 1 shows that the existence of the drop-out action is a necessity for the boundness of .

Corollary 1.

COP is unbounded without constraint (d).

We prove Corollary 1 by letting equal a constant for all . Then the value of has an increase of and the COP is still feasible. Thus, the defender can achieve arbitrarily large utility by taking an arbitrarily large .

Iv-A1 Violation of Bayesian Plausibility

The concept of Bayesian plausibility has been defined in [8]

, which states that the expected posterior probability should equal the prior for all valid

. However, we show in Lemma 2 that the trust manipulator can disqualify Bayesian plausibility by making the user of type to hold a different initial belief as the defender; i.e., .

Lemma 2 (Bayesian Plausibility).

The user’s expected posterior probability under any valid generator and type is always a valid probability measure yet is Bayesian plausible if and only if the defender and the user have the same initial belief .


A valid generates security policy with probability . After receiving , the user of type obtains his posterior belief according to (1). Thus, the expected posterior probability is a valid probability measure over . The Bayesian plausibility requires , under all valid , which is equivalent to the condition . ∎

Iv-A2 Nonexistence of Incentive Modulator

If the defender does not have the capacity to change the user’s incentive, then

, and COP can be transformed into a linear program (LP) by introducing the following new variables, i.e.,

. These new variables take non-negative values and satisfy the following new constraints, i.e., and . After we have solved the new LP, we can obtain the value of the initial beliefs by and for all state .

We present the dual of the linear program (DDP) and illustrate how it represents a pricing problem for security as a service (SECaaS). For a more explicit interpretation, we focus on the case where the defender cannot design the initial beliefs . Define shorthand notation and we obtain DDP as follows:

To interpret DDP, we need to introduce an additional player, the defender’s client, who requires the deception mechanism as an on-demand security service to regulate the user’s action. As the service provider, the defender forms her utility function based on the client’s demand and charges the client a price based on the value of the state. The price consists of a demand-driven base fee, i.e., , based on the client’s expected utility when the user takes action and a compensatory fee based on the level of difficulty to regulate the user to take action under state . In particular, the dual variable represents the unit price to regulate the user of type to take action rather than under the security policy .

If an action dominates under type as defined in Definition 1, then , for all valid , and the user naturally has the incentive to take action . In that case, the defender charges the client zero compensatory fee; i.e., . On the other hand, if is dominated under type , then the deception mechanism cannot regulate the user to take action and the compensatory fee goes to infinity. For all other cases, the client pays a reasonable unit fee of to the defender for using the deception mechanism and regulating the user’s action. Meanwhile, the client aims to minimize the total payment to the defender under various security situations, i.e., .

Iv-B Concavification and Graphic Analysis

Section IV-B1 provides the optimal generator design under the benchmark case where the defender can neither modulate the user’s incentive, i.e., , nor distort their initial beliefs. In Section IV-B2 and IV-B3, the defender further incorporates the design of the incentive modulator and the trust manipulator into the deception mechanism to make the user’s behavior more regulatable and achieve a higher utility.

Throughout the entire section IV-B, the drop-out action always exists and we focus on a common prior belief for the defender and the user, i.e., to provide a more explicit graphic analysis. Define and the common prior belief in the vector form as . Since different types of users have the same initial beliefs, the posterior beliefs are also the same for a valid generator. Denote as the user’s posterior belief under state , the belief vector , and the utility vector .

Iv-B1 Generator Design under the Benchmark Case

As stated in (2), the user of type aims to find an action to maximize his expected utility for a given posterior belief . Thus, we can write the optimal action as a function of . Since the user’s expected utility is an affine function of for any action , maximizing over for all in the convex domain results in a piecewise linear and convex (PWLC) function as summarized in Lemma 3. The proof of convexity follows directly from the fact that the value of is the point-wise maximum of a group of affine functions over .

Lemma 3.

The optimal expected utility of the user of type , i.e., , is continuously PWLC with respect to vector .

We visualize the PWLC property of the user’s optimal expected utility under a binary state set in Fig. 2. The x-axes of the circles represent four belief thresholds, , and , which divide the entire belief region into three sub-regions; i.e., the user of type takes action if his posterior belief belongs to the sub-region , action if , and action if . Although action is not dominated under type based on Definition 1, it is inactive over . We define as the set of the belief thresholds for the user of type .

Fig. 2: The expected utility of the user of type versus posterior belief under states and actions. The solid lines represent the user’s optimal expected utility as a PWLC function of . The set of belief thresholds is .

For high dimensional state space , the user’s entire belief region is a simplex. For each type , we can divide the entire belief region into at most sub-regions , and . If the posterior belief falls into the sub-region , the user of type takes as his optimal action. In Fig. 2, is the interval and is the empty set. Lemma 4 shows that all sets , are convex sets, thus they are also connected. The proof follows directly from the definition of convexity.

Lemma 4.

Sets , are convex.

We have illustrated the belief region partition under any given type . Since the user has possible types, we further divide the belief region into finer sub-regions. Let set be the sub-region of the posterior belief where the user of type takes as the optimal action. In particular, define as the belief region where the user takes action when his type is and when his type is for all and . We can also obtain by combining all the where and . Since the intersection of any collection of convex sets is convex, and are all convex and connected sets, i.e., convex polytopes. We visualize these convex polytopes of a -simplex in Fig. 3.

Fig. 3: An illustration of convex polytopes with three types , two actions , and three states . The entire region is a simplex, i.e., an equilateral triangle.

Although there are possible sets, i.e., , they cannot be all nonempty at the same time. Take as an example, actions can generate at most belief thresholds over for each type as shown in Fig. 2. Thus, the whole belief region can be divided into at most regions. When and the belief region is -simplex as shown in Fig. 3, for each type, actions representing planes can generate at most lines that can be projected vertically into the interior of the -simplex. Thus, these lines can divide the -simplex into at most belief regions. The results can be extended to

as a revise of the hyperplane arrangement problem

[33] where the number of belief region partitions grows in a polynomial rate of rather than an exponential rate of , which is summarized in Lemma 5.

Lemma 5 (Regulation Limitation).

For all duplicity games with feasible generators represented by the vector , at most elements of are nonzero for all where is a polynomial function of for all .

Remark 1.

If under the given duplicity game, then it is impossible for the defender to regulate the user of type to take action for all by designing policy generators independently. Lemma 5 illustrates the defender’s regulation limitation for any security scenario; i.e., among all potential security policies, the defender can choose at most policies as the possible output of the generator to avoid violating the user’s incentive.

The honeypot example motivates a question: whether public security policies can elicit different actions for different types of users? If the answer is positive, the deception mechanism can distinguish between adversarial and legitimate users automatically. Since each security policy uniquely determines a posterior belief, we define separability between the user of type and type concerning the posterior belief in Definition 3.

Definition 3.

For the given and , a posterior belief is -separable if there exists and such that .

All the posterior beliefs that are -separable comprise a -separable belief region which may not be connected. Intuitively, the size of the region is reduced as the utilities of the users of type and becomes more aligned. Two utilities are perfectly aligned (resp. misaligned) if they have the same value (resp. opposite values). Theorem 2 shows that a scaling of and a translation of do not change the separability of the user’s actions under type and type .

Theorem 2 (Type Separability).

Suppose the user’s utilities under type and

satisfy a linear transformation; i.e., there exists

, such that . Then, no posterior belief is -separable if and only if ; all posterior beliefs are -separable if and only if .


For any given , there exists an action such that . Thus, . Then the user of type at posterior belief has the same optimal action if and only if . ∎

Remark 2.

Theorem 2 implies that if two types of users have completely opposite (resp. identical) utilities, there exists no security policies that can elicit them to take the same action (resp. different actions) under any security scenarios.

Given all the previous structural results from the user’s side, we are now able to characterize the defender’s optimal design of the policy generator. We define the defender’s ex-ante utility under the common prior belief as . Since is linear with respect to inside each convex polytope , and Lemma 5 provides an upper bound of the number of different convex polytopes, we obtain the piecewise linear structure of the defender’s ex-ante utility in Lemma 6. Since the region is determined based on the user’s ex-ante utility rather than the defender’s, is in general discontinuous at the boundary of these convex polytopes.

Lemma 6.

The defender’s ex-ante utility is a (possibly discontinuous) piecewise linear function of the common prior belief with at most pieces.

Since the defender can implement generator to alter the user’s prior belief and further the action, we use the concavification technique introduced in [7, 8] to relate the defender’s ex-ante utility with the ex-post utility 222