I Introduction
Game theory has long been a powerful and conventional paradigm for modeling complex and intelligent interactions among a group of players and improving decision making for selfish players, since the seminal work [1, 2, 3] by John von Neumann, John Nash, and others. Hitherto, it has found a vast range of realworld applications in a variety of domains, including economics, biology, finance, computer science, politics, and so forth, where each individual player is only concerned with its own interest [4, 5, 6]. It played an extremely important role even during the Cold War in the 60s, and has been employed by many national institutions in defense, such as United States Agencies for security control [7].
Adversarial games are a class of particularly important game models, where players deliberately compete with each other while simultaneously achieving their own utility maximization. To date, adversarial games have been an orthodox framework for shaping highefficient decision making in numerous realistic applications, such as poker, chess, evader pursuing, drug interdiction, coast guard, cybersecurity, and national defense, etc. For example, in Texas Hold’em poker, which has been one of primary competitions as a benchmark for testing researchers’ proposed algorithms in game theory and artificial intelligence (AI) held by international wellknown conferences such as AAAI, multiple players compete against each other to win the game by seeking sophisticated strategy techniques [8]. Generally speaking, adversarial games enjoy several main features as follows: 1) hardness of efficient and fast algorithms design with limited computing resources and/or samples; 2) imperfect information for many practical problems, that is, some information is private to one or more players which, however, is hidden from other players, such as the card game of poker; 3) large models, including large action spaces and information sets, for example, the adversary space in the road networks security problem is of the order [9]; 4) incomplete information for a multitude of reallife applications, that is, one or more agents do not know what game is being played (e.g., the number of players, the strategies available to each player, and the payoff for each strategy), and in this case, the very game being played is generally represented with players’ uncertainties, like uncertain payoff functions with uncertain parameters; and 5) possible dynamic trait, i.e., the played game is sometimes timevarying, instead of static, for example, a poacher may have different poaching strategies in a wildlife park as the environment varies with seasons. It is worth pointing out that incomplete information is understood distinctly from imperfect information here, as distinguished by some researchers, although they are interchangeably used in some literature. In addition, other possible characteristics include bounded rationality, where players may be not fully rational, such as arbitrarily random lone wolf attacks by terrorists. However, it is noteworthy that not all adversarial games are with imperfect and/or incomplete information, for example, the game of Go has both perfect and complete information, since it has explicit game rules and all chess pieces’ positions are visible to both players at all times as well as the actions of the opponent, which has been well solved by wellknown AI agents, such as AlphaGo and AlphaZero [10, 11, 12].
As the competitive feature is ubiquitous in a large number of realworld applications, adversarial games have been extensively investigated until now [13, 14, 15, 16, 17, 18]. For example, the authors in [13] provided a broad survey of technical advances in Stackelberg security games (SSG) in 2018, the authors in [14] reviewed some main Nash equilibrium (NE) computing algorithms for extensiveform games with imperfect information based on counterfactual regret minimization (CFR) methods, the authors in [15] reviewed a combined use of game theory and optimization algorithms along with a new categorization for researches conducted in this area, the authors in [16] reviewed distributed online optimization, federated optimization from the perspective of privacypreserving mechanisms, and cooperative/noncooperative games from two facets, i.e., minimizing global costs and minimizing individual costs, and the authors in [17] surveyed recent advances of decentralized online learning, including decentralized online optimization and online game, from the perspectives of problem classifications, performance metrics, stateoftheart performance results, and potential research directions in future. Additionally, in consideration of the importance of game theory in national defense, some reviews of game theory in defense applications were succinctly provided in [18, 19]
, and a survey of defensive deception based on game theory and machine learning (ML) approaches was presented in
[20]. Nonetheless, a thorough overview for adversarial games from the perspectives of the basic models’ knowledge, equilibrium concepts, optimal strategy seeking techniques, research frontiers, and prevailing algorithms is still lacking.Motivated by the above facts, this survey aims to provide a systematic review on adversarial games from several dimensions, including the models of three main models frequently employed in adversarial games (i.e., zerosum normalform and extensiveform games, Stackelberg (security) games, and zerosum differential games), (approximate) optimal strategy concepts (i.e., NE, correlated equilibrium, coarsecorrelated equilibrium, strong Stackelberg equilibrium, teammaxmin equilibrium, and corresponding approximate ones), (approximate) optimal strategy computing techniques (e.g., CFR methods, AI methods), stateoftheart results, prevailing algorithms, potential applications, and promising future research directions. To the best of our knowledge, this survey is the first systematic overview on adversarial games, generally providing an orthogonal and complementary component for the aforementioned survey papers, which may aid researchers and practitioners in relevant domains. Please note that the three game models are not mutually exclusive, but may overlap for the same game from different viewpoints. For example, Stackelberg games and differential games can also be zerosum games, etc. In addition, there actually exist other models leveraged for adversarial games, such as Bayesian games, Markov games (or stochastic games), signaling games, behavioral game theory and evolutionary game theory. However, we are not ambitious to review all of them in this survey, since each of them is of independent interest and pretty abundant in existing diverse materials.
The structure of this survey is organized as follows. The detailed game models and solution concepts are introduced in Section II, the existing main literature is reviewed along with stateoftheart results in Section III, some prevailing algorithms are expounded in Section IV, an array of applications are presented in Section V, promising future research directions are discussed in Section VI, and finally the conclusion is drawn in Section VII.
Notations: Define be the set of positive numbers for an integer . Denote by , , and the sets of real numbers,
dimensional real vectors, and nonnegative
dimensional real vectors, respectively. For a finite set with elements, define (i.e., the simplex of dimension ), and be the cardinality of . Let anddenote the mathematical probability and expectation, respectively. Let
denote the transpose of , and be the inner product. and denote vectors or matrices of all entries and with compatible dimension in the context, respectively, sometimes with explicit subscript being the dimension.Ii Models of Adversarial Games
This section provides three main models for adversarial games, i.e., zerosum normalform and extensiveform games, Stackelberg (security) games and differential games, along with solution concepts in these game models, and a general framework of adversarial games is illustrated in Fig. 1.
Iia ZeroSum NormalForm and ExtensiveForm Games
Normalform and extensiveform games are two widely employed game models, accounting for simultaneous or sequential actions committed by the players in a game.
NormalForm Games (NFGs). A normalform (or strategicform) game is denoted by a tuple [4], where is a finite set of players. In the meantime, is the action profile set for all players, where is the set of pure actions or strategies available to player , and is a joint action profile. Moreover, , where is a realvalued utility (or payoff) function for player . Also, a mixed strategy/policy for player
is a probability distribution over its action set
, denoted by , and denotes the probability for player to commit an action . The expected utility of player can be expressed as , where is the joint (mixed) action profile and denotes the joint action profile of all players except player . Similarly, let be the joint (pure) action profile of all players except player , and denote by for manifesting the dependency of a joint pure action profile. The social welfare is defined as for a pure action profile , whose mixed strategy correspondence is given as . In addition, the game is called constantsum if for any action profile , it holds that for a constant , and called zerosum if , as an illustration in Fig. 2.Note that for the case with continuous action sets, generally assumed closed and convex, they are usually called continuous games.
In what follows, the extensiveform games with imperfect information are introduced, which can reduce to ones with perfect information when information sets of each player is a singleton [5].
ImperfectInformation ExtensiveForm Games (IIEFGs). An IIEFG is a tuple , where is a finite set of players, is a set of histories (i.e., nodes), representing the possible sequence of actions, and denotes the set of terminal nodes, which have no further actions and award a value to each player. Outside of , a different “player” exists, denoted , representing chance decisions. Moreover, the empty sequence is included in , standing for a unique root node. At a nonterminal node , is the action function assigning a set of available actions at (here is different from in normalform games, which should be clear from the context), and is the player function assigning a player to the node who takes an action at that node with if chance determines the action at . And means that is led to by a sequence of actions, i.e., is a prefix of . is the set of utility functions, where is the utility function of player . If there is a constant such that for all , then the game is called a constantsum game, and a zerosum game when .
The main feature “imperfect information” is represented by information sets (infosets) for all players. Specifically, is the set of information sets, where is a partition of satisfying that and for any for some . That is, all nodes in the same infoset of are indistinguishable to player . Note that each node is only in one infoset for each player. When all players can remember all historical information, it is called perfect recall. Formally, let be histories such that , and then perfect recall means that if and do not share an infoset and each is not a prefix of the other, then and also do not share an infoset.
A normalform plan (or pure strategy) of player is a tuple , which assigns an action to each infoset of player . A normalform strategy means a probability distribution over , i.e., . A behavioral strategy (or simply, strategy) is a probability distribution over for each infoset of player . A joint strategy profile is composed of all players’ strategies , i.e., , with representing all the strategies except . Denote by (or ) the probability of a specific action at infoset , and the reach probability of history if all the players select their actions according to . For a strategy profile, player has its total expected payoff as . Denote by the set of all possible strategies for player .
A best response for player to is a strategy . In a twoplayer zerosum game, the exploitability of a strategy , defined as , where is a Nash equilibrium, as defined later. In multiplayer games, the total exploitability (or NashConv) of a strategy profile is defined as [21] , and the average exploitability (or simply exploitability) is defined as , which is leveraged to measure how much can be gained by unilaterally deviating to their best response, generally interpreted as a distance from a Nash equilibrium.
Note that besides the above normalform and extensiveform games, other classes of games may be conducive as well in adversarial games, such as Markov games (or stochastic games) [22], where the game state changes according to a transition probability based on the current game state and players’ actions, Bayesian games [23], which models game uncertainties with incomplete information, and so forth.
In what follows, some solution concepts for related games are introduced.
The Nash equilibrium is the most widely adopted notion in the literature [2].
Definition 1 (Nash Equilibrium (Ne)).
For both normalform and extensiveform games, a strategy is called an NE for a constant if
(1) 
that is, the gain is at most if any player changes its own strategy solely. Moreover, it is called an NE when , that is, is a best response of for any player , i.e., .
It is well known that there exists at least one NE in mixed strategies for games with finite number of players and finite number of pure strategies for each player [2].
Even though NE may exist for many games and it is computationally efficient for twoplayer zerosum games, it is well known by complexity theory that approximating an NE in player () zerosum games and even twoplayer nonzerosum games is computationally hard, that is, it is PPADcomplete for general games [24, 25, 26]. As an alternative, (coarse) correlated equilibrium is often considered for normalform games in the literature, which is efficiently computable in all normalform games, as defined in the following [27].
Definition 2 (Correlated Equilibrium (Ce)).
For a normalform game , an CE is a probability distribution over if for each player and any swap function (usually called strategy modification),
(2) 
That is, no player can gain more payoff by unilaterally deviating its action privately informed by a coordinator who samples a joint action from that distribution. Furthermore, another relevant notion is defined below [28].
Definition 3 (Coarse Correlated Equilibrium (Cce)).
For a normalform game , an CCE is a probability distribution over if for each player and all actions ,
(3) 
The above condition looks like almost the same as that for CE, except the removal of the conditioning on the action , by arbitrarily selecting an action on their own, instead of following the action advised by the coordinator. For NE, CE, and CCE, it is known that they are payoff equivalent to each other in twoplayer zerosum games by the minimax theorem [29]. Recently, the notions of CE and CCE have been extended to extensiveform games in [30, 31], which however have been less studied by now.
In an IIEFG, let us consider the case where all the players in are cooperative, thus forming a team, who take actions independently and play against an adversary , and and , called a zerosum singleteam singleadversary extensiveform team game (or simply zerosum team game (TG)) [32]. Before introducing the notion of teammaxmin equilibrium, it is necessary to first prepare some essentials. Let denote the set of action sequences of player , where an action sequence of player , defined by a node , is the ordered set of actions of player that are on the path from the root to . Let be the dummy sequence to the root. A realization plan is a function mapping each action sequence to a probability, satisfying
(4) 
where denotes the action sequence leading to .
With the above preparations, the teammaxmin equilibrium, first introduced in [33], is defined as follows [32].
Definition 4 (TeamMaxmin Equilibrium (TME)).
A TME is defined as
(5) 
where stands for the team’s utility defined by if at least one terminal node is achieved by the joint plan (i.e., is nonempty) with the chance determined by chance nodes, and otherwise.
A TME is generally unique and it is an NE which maximizes the team’s utility. In addition, the concept of TME can be similarly defined, at which both the team and the adversary can gain at most if any player unilaterally changes its strategy.
IiB Stackelberg Games
Stackelberg games (SGs, or leaderfollower games) can date back to Stackelberg competition introduced in [35] to model a strategic game between two firms, the leader and the follower, where the leader can take actions first. SGs, as games with sequential actions and asymmetric information, have many practical applications, for example, PROTECT, a system that the United States Coast Guard utilizes to assign patrols in Boston, New York, and Los Angeles [36], and ARMOR, an assistant deployed in Los Angeles International Airport in 2007 for randomly scheduling checkpoints on the roadways entering the airport. In what follows, general Stackelberg games and Stackelberg security games [37] are introduced, where the second one is an important special case of general SGs.
General Stackelberg Game (GSG). A GSG consists of a leader, who commits an action first, and followers, who can observe and learn the leader’s strategy and then take actions in response to the leader’s strategy, see Fig. 3. Denote by , , the sets of followers, the leader’s pure strategies, and each follower’s pure strategies, respectively. The leader knows the probability of facing follower , denoted as . Denote by the mixed strategy of the leader, where the th component represents the probability of choosing the th pure strategy by the leader. Let denote the decision of follower to take a pure strategy such that for all . Note that it is enough for the rational followers to only consider pure strategies [38]. For the leader and each follower , the utilities (or payoffs, rewards) of the leader and the follower are captured by a pair of matrices , where is the utility matrix of the leader when facing follower , and is the utility matrix of follower . Then, the expected utilities of the leader and follower can be, respectively, given as
(6)  
(7) 
where and for each .
Stackelberg Security Game (SSG). In SSG, as a specific case of GSG, the leader and followers are viewed as the defender and attackers, where the defender aims to schedule a limited number of security resources to protect (or cover) a subset of targets from the attackers’ attacks, with . The notations are the same defined as in the above GSG. Noting that in this case, the leader’s pure strategy set is now composed of all possible subsets of at most targets that can be safeguarded simultaneously, and indicates whether attacker attacks target . Let be the probability of coverage of target such that , where connotes that the target is covered by pure strategy . When facing attacker who attacks target , the defender’s utility is if the target is covered or protected, or if the target is uncovered or unprotected. The utility of attacker is when attacking target that is covered, or when attacking target that is uncovered. It is generally assumed that and , which are in line with the common sense. The expected utilities for the defender and attacker is, respectively, expressed as
(8)  
(9) 
A most widely adopted solution for GSG and SSG is the socalled strong Stackelberg equilibrium, which always exists in all Stackelberg games [39, 37]. Recall that it is enough for each follower to play pure strategies.
Definition 5 (Strong Stackelberg Equilibrium (SSE)).
A strategy profile for a GSG forms an SSE, if

is optimal for the leader:
where denotes the attacker ’s best response against .

Each follower always plays a bestresponse, i.e.,

Each follower breaks ties in favor of the leader:
The tiebreaking rule is reasonable in cases of indifference since the leader can often induce the favorable equilibrium by choosing a strategy arbitrarily close to the equilibrium that makes the follower prefer the desired strategy [40]. When the tiebreaking rule is in favor of the followers, then the equilibrium is called weak Stackelberg equilibrium (WSE), which however does not always exist [41]. Moreover, the concept of SSE can be similarly defined for SSGs.
IiC ZeroSum Differential Games
Differential games (DGs), also known as dynamic games [41], are a natural extension of sequential games to continuoustime scenario, which are expressed by differential equations and first introduced by Isaacs [42]. DGs can be regarded as an extension of optimal control [43], which usually has a single decision maker with a single objective function, while multiple players are involved in a DG with noncooperative objectives. Since this survey is concerned with adversarial games, zerosum DGs (mostly involving two players in the literature) are considered here, although many other types of DGs emerge in the literature, including nonzerosum differential games, meanfield games, differential graphical games, Dynkin games, and so on [44, 45].
A twoplayer zerosum differential game (TPZSDG) is described by a dynamical system as
(10) 
where is the state vector at time , is the initial time, is the initial state, , are control constraints for players and , respectively, and are control actions (or signals) for player and , respectively, and is the dynamics, as illustrated in Fig. 4.
For different setups in the literature, distinct cost functions are generally employed, most of which, however, are either based on or variants of an essential and important cost function, as given below:
(11) 
where is the running cost (or stage cost) and is the terminal cost (or final cost).
With (11), the goal of DG (10) is for player to minimize the cost , while player aims at maximizing it, i.e.,
(12) 
For (12), the optimal cost of is called the value of the game, expressed as a value function . Moreover, the solution notion is still the NE as in normalform and extensiveform zerosum games, also called minimax equilibrium (or minimax point, saddle point) in the literature since the studied problem is in fact a saddle point game (or saddle point problem/optimization).
Note that dynamics (10) is deterministic. In the meantime, stochastic DGs have also been addressed in the literature, described by stochastic differential equations with the standard Brownian motion [44]. It is also noteworthy that the above DGs are usually studied under a set of assumptions, such as the compactness of , and the Lipschitz continuity of , among others [45].
Finally, the main features of the aforementioned games are summarized in Table I.
Game models  Player numbers  Action order  Information  Dynamics  
Zerosum NFG  mostly simultaneous  symmetric 


Zerosum EFG 

sequential  symmetric  mostly discretetime  
GSG and SSG 

sequential  asymmetric  mostly discretetime  
Zerosum DG  mostly  mostly simultaneous  mostly symmetric  continuoustime 
Iii Research Classification and Frontiers
This section aims to succinctly summarize the relevant literature for zerosum games, GSGs, SSGs, and TPZSDGs along with the emerging stateoftheart research. However, the relevant literature on adversarial games is too vast to cover it all, and thus only the literature of our interest is reviewed throughout this survey.
Iiia ZeroSum Games (ZSGs)
Both normalform and extensiveform ZSGs studied in the literature can be generally categorized into the following main aspects: bilinear games, saddle point problems, multiplayer ZSGs, team games, and imperfectinformation ZSGs, as discussed below.

Bilinear Games. Bilinear games are simple models for delineating twoplayer games, generally in normalform as [46]: maximizing utilities and for players and , respectively, where and are payoff matrices, subject to strategy sets and with some and . A bilinear game is usually denoted by the payoff matrices pair , which is zerosum when , and as an important notion, the rank of a game is defined as the rank of matrix . Several interesting games can be viewed as special cases of bilinear games, such as bimatrix games [47, 48, 49], where and , imitation games (a special case of bimatrix games with ) [50], and the Colonel Blotto game (i.e., two colonels simultaneously allocate their troops across different battlefields) [51]. In addition, multiplayer polymatrix games [52] can also be equivalently transformed to bilinear games [46]. Generally speaking, the existing literature mainly focuses on the computational complexity and polynomialtime algorithm design for approximating NE of bilinear games [53], bimatrix games [54], polymatrix games [55], and the Colonel Blotto game [56]. Recently, it is shown that NE computation in twoplayer nonzerosum games with rank is PPADhard [57, 58]. And computing a approximate NE is PPADhard even for imitation games for any [50], where is the number of moves available to the players, and a polynomialtime algorithm was developed for finding an approximate NE in [50]. Also, computing an NE in a tree polymatrix game with twenty actions per player is PPADhard [55], and a polynomialtime algorithm for approximate NE in bimatrix games was proposed in [54], which is the stateoftheart in the literature. For the Colonel Blotto game, efficient and simple algorithms have been recently provided in [59, 60, 61], and meanwhile, various scenarios have been extended for this game, including dynamic Colonel Blotto game [62], generalized Colonel Blotto and generalized lottery Blotto games [63], and multiplayer cases [64, 61]. Furthermore, bilinear games are generalized to hidden bilinear games in [65], where the inputs controlled by players are first processed by a smooth function, i.e., a hidden layer, before coming into the conventional bilinear games.

Saddle Point Problems (SPPs). SPPs are also called saddle point optimization, minmax/minimax games, or minmax/minimax optimization in the literature. The formulation of a general SPP [66] is given as , where and are closed and convex, possibly the entire Euclidean spaces or their compact subsets. For general SPPs, besides zerosum bilinear games, other two types have been extensively considered, that is, nonbilinear SPPs and bilinear SPPs. A nonbilinear SSP [67, 68] is expressed as , where is a general coupling function, and as a special case, when with , the game is called a bilinear SPP [69, 70, 71] due to the bilinear coupling. The existing research mainly centers on equilibrium existence, computational and sampling complexity, and efficient algorithm design, for instance, as done in the aforementioned recent works. Meanwhile, various scenarios have been investigated in the literature, including projectionfree methods by applying the FrankWolfe algorithm [72, 73], nonconvexnonconcave general SPPs [74, 75], linear lastiterate convergence [76], SSPs with adversarial bandits and delays [77], periodic zerosum bimatrix games with continuous strategy spaces [78], compositional SSPs [79], decentralized setup [80], and hidden general SPPs [81], where the controlled inputs are first fed into smooth functions whose outputs are then treated as inputs for the traditional general SPPs. Finally, it is noteworthy that the general SPPs with sequential actions have also been studied, called minmax Stackelberg games, for example, the recent work [82] with dependent feasible sets.

MultiPlayer ZeroSum Games (MPZSGs). The above discussed games usually involve two players. It is well known that approximating an NE in multiplayer zerosum games and even twoplayer nonzerosum games is PPADcomplete [24, 25, 26]. Moreover, it is known that multiplayer symmetric zerosum games might have only asymmetric equilibria, which is consistent with that of twoplayer and multiplayer symmetric nonzerosum games, but in contrast with the case in twoplayer symmetric zerosum games that always have symmetric equilibria (if equilibria exist) [83]. In the literature, most of works focus on multiplayer zerosum polymatrix games (also called network matrix games in some works), where the utility of each player is composed of the sum of utilities gained by playing with its neighbors in an undirected graph [52]. The authors in [84] generalized von Neumann’s minimax theorem to multiplayer zerosum polymatrix games, thus, implying convexity of equilibria, polynomialtime tractability, and convergence of noregret learning algorithms to NEs, and lastiterate convergence was studied in [85] for multiplayer polymatrix zerosum games. timeaverage convergence was established by using alternating gradient descent in [86], where is the time horizon. Moreover, it is shown that for continuoustime algorithms, timeaverage convergence may fail even in a simple periodic multiplayer zerosum polymatrix game or replicator dynamics, but being Poincaré recurrent in [87, 88]. What’s more, it is realized that mutual cooperations among players may benefit more than pursuing selfish exploitability, and in this case, team/alliance formation is also studied in the literature, for example, [89], where it was demonstrated that team formation may be seen as a social dilemma. Additionally, other pertinent research encompasses multiplayer generalsum games [90, 91, 92] and machine learning based studies [93], etc.

Team Games (TGs).
Generically, team games refer to those games where at least one team exists with the cooperation of team members with communications either before the play, or during the play, or simultaneously before and during the play, or without any communications. In general, team games in the literature can be classified from two perspectives. One perspective depends upon the team number, i.e., oneteam games (or adversarial team games)
[94], where players in the team enjoying the same utility function play against an adversary independently, and twoteam games [95] consisting of two teams in a game. The other perspective is on perfectinformation and imperfectinformation games. For team games, TME is an important solution concept, for which it is known that computing a TME is FNPhard and inapproximable in additive sense [96, 97]. Even though, efficient algorithms for computing a TME in perfectinformation zerosum NFGs have been developed until now, e.g., [94]. Meanwhile, a class of zerosum twoteam games in perfectinformation normalform was studied in [95], where finding an NE is shown to be CLShard, i.e., unlikely to have a polynomialtime NE computing algorithm. Moreover, as twoteam games, twonetwork zerosum games are also addressed, where each network is thought of as a team [98, 99, 100]. For imperfectinformation zerosum team games, the researchers have investigated a variety of scenarios centering around the computational complexity and efficient algorithms, such as oneteam games [32, 101, 102], oneteam game with two members in the team [103], the computation of team correlated equilibrium in twoteam games [104]. 
ImperfectInformation ZSGs (IIZSGs). Unlike perfectinformation games, such as Chess, Go and backgammon, IIZSGs, involving individual players’ primate information that is hidden to other players, are more challenging due to information privacy and uncertainty, especially for large games with large action spaces and/or infosets. For example, the game of headsup (i.e., twoplayer) limit Texas Hold’em poker, with over infosets, has been a challenging problem for AI for over years, before essentially solved by Cepheus [105], the first computer program for handling imperfect information games that are played competitively by humans. Also, the game of nolimit Texas Hold’em poker has more than infosets, for which DeepStack [106] and Libratus [107] are the first line of AI agents/algorithms to defeat professional humans in headsup nolimit Texas Hold’em poker. As such, most of research focuses on the computing of NEs in twoplayer IIZSGs in the literature [108, 109], aiming to develop efficient superhuman AI agents in face of the challenges of imperfect information, large models and uncertainties. To handle large games with imperfect information, several techniques have been successively proposed, for exmaple, pruning, abstraction, and search [110, 111, 112]. Roughly speaking, pruning aims to avoid traversing the whole game tree for an algorithm while simultaneously ensuring the same convergence, including regretbased pruning, dynamic thresholding, bestresponse pruning, and so on [113]. Abstraction aims to generate a smaller version of the original game by bucketing similar infosets or actions, while maintaining as much as possible the strategic features of the original game [114], mainly including information abstraction and action abstraction. Meanwhile, search tries to improve upon the (approximate) solution of a game abstraction, which may be far from the true solution of the original game, by seeking a more precise equilibrium solution for a faced subgame, such as depthlimited search [111, 115]. Moreover, it has been shown recently that some twoplayer poker games can be represented as perfectinformation sequential Bayesian extensive games with efficient implementation [116]. The authors in [117] recently bridged several standing gaps between NFG and EFG learning by directly transferring desirable properties in NFGs to EFGs, guaranteeing simultaneously lastiterate convergence, lower dependence on the game size, and constant regret in games. Besides, bandit feedback is of practical importance in realworld applications for IIZSGs [118, 119], where only the interactive trajectory and the payoff of the reached terminal node can be observed without prior knowledge of the game, such as the tree structure, the observation/state space, and transition probabilities (for Markov games) [120]. On the other hand, multiplayer IIZSGs are more challenging and thus have been less researched except for a handful of works, for example, Pluribus [121], the first multiplayer poker agent, has defeated top humans in sixplayer nolimit Texas Hold’em poker (the most prevalent poker in the world) [122], and other endeavors [123, 124, 125, 119, 126]
. Aside from deterministic methods, AI approaches have achieved great success in IISSGs based on reinforcement learning, deep neural networks and so on
[127, 106, 128, 129, 130, 131, 132, 120, 133, 134, 135, 136, 137, 138], for instance, AlphaGo (the first AI agent to achieve superhuman level in Go) [10], AlphaZero (with initial training independent of human data and Gospecific features, reaching stateoftheart performance in Go, Chess and Shogi with minimal domain knowledge) [11], and DeepStack [106], to name a few. More details can refer to a recent survey for AI in games [139]. Note that other closely related research subsumes imperfectinformation generalsum games with full and bandit feedback [140, 141, 142], twoplayer zerosum Markov games [143], and multiplayer generalsum Markov games [144].
It should be noted that incomplete information is also important in adversarial games, mainly comprising of Bayesian games (cf. a recent survey [145]).
IiiB Stackelberg Games
Stackelberg games are roughly summarized from four perspectives, i.e., GSGs, SSGs, continuous Stackelberg games, and incompleteinformation Stackelberg games.

GSGs. The research on GSGs mainly lies in three aspects, i.e., computational complexity, solution methods, and their applications. For computational complexity, when only having one follower in GSGs, it is known that the problem can be solved in polynomial time, while it is NPhard for the multiple followers case [38]
. Regarding solution methods, there are an array of proposed methods in the literature, but primarily depending upon approaches for coping with linear programming (LP) and mixed integer linear programming (MILP), including cutting plane methods, enumerative methods, and hybrid methods, among others
[146, 147]. Note that both GSGs and SSGs can be formulated as bilevel optimization problems [146, 147], where bilevel optimization has a hierarchical structure with two level optimizations, one lower level optimization (follower) nested in another upper level optimization (leader) as constraints, which is an active research area unto itself [148]. As for practical applications, a multitude of realworld problems have been tackled using Stackelberg games, such as economics [149], smart grid [150, 151], wireless networks [152], dynamic inspection problems [153], industrial internet of things [154], etc. It should be noted that other relevant cases have also been studied in the literature, such as multileader cases [155, 156, 157, 158, 159], the case with bounded rationality [160], and generalsum games [161], etc. 
SSGs. In general, SSGs can be classified by the functionality of security resources. To be specific, when every resource is capable of protecting every target, it is called homogeneous resources, and when resources are restricted to protecting only some subset of targets, it is called heterogeneous resources. Meanwhile, resources can also be distinguished by how many targets they are able to cover simultaneously, and in this case, a notion, called schedule, is assigned to a resource with the size of the schedule being defined to be the number of targets that can be simultaneously covered by the resource, including the case with size [162] and greater than [163]. For these scenarios, the computational complexity was addressed in [164] when existing a single attacker, as shown in Table II. With regard to solution methods, similar methods for solving GSGs can be applied to handle SSGs. Moreover, the practical applications of SSGs encompass wildlife protection [165], passenger screening at airports [166], crime prevention [167], cybersecurity [168], information security [169], border patrol [170, 171], and so forth. In the meantime, there are other scenarios addressed in the literature, like multidefender cases [172, 173], Bayesian generalizations [174], and the case with bounded rationality [175] and ambiguous information [176], etc.
SSGs Size of schedule Homogeneous resources P P NPhard Heterogeneous resources P NPhard NPhard TABLE II: Complexity results with a single attacker [164]. 
Continuous Stackelberg games. This sort of games mean Stackelberg games with continuous strategy spaces. In general, there exist two players, a leader and a follower, who have cost functions and with , respectively, where and are closed convex and possibly compact strategy sets for the leader and the follower, respectively. Then, the problem can be formally written as
(13) where the follower still takes actions in response to the leader after the leader makes its decision first. In this case, a strategy of the leader is called a Stackelberg equilibrium strategy [177] if
(14) where is the best response of the follower against . Along this line, a hierarchical Stackelberg v/s Stackelberg game was studied in [178], where the first general existence result for the games’ equilibria is established without positing singlevaluedness assumption on the equilibrium of the followerlevel game. Furthermore, the connections between the NE and Stackelberg equilibrium were addressed in [177], where convergent learning dynamics are also proposed by using Stackelberg gradient dynamics that can be regarded as a sequential variant of the conventional gradient descent algorithm, and both zerosum and generalsum games are considered therein. Additionally, as a special case of the above game (13), minmax Stackelberg games are paid attention to as well, where the problem is of the form with being the cost function. This problem has been investigated in the literature, especially for the case with dependent strategy set [82, 179], i.e., inequality constraints are imposed for the follower for some function , for which the prominent minimax theorem [29] does not hold any more.

IncompleteInformation Stackelberg Games. Incomplete information means that the leader can only access partial information or cannot access any information of the followers’ utility functions, moves, or behaviors. This is in contrast with the traditional Stackelberg games, where the followers’ information is available to the leader. This weak scenario has been extensively considered in recent years motivated by practical applications. For example, the authors in [180] studied situations in which only partial information on the attacker behavior can be observed by the leader. And a singleleadermultiplefollowers SSG was considered in [181] with two types of misinformed information, i.e., misperception and deception, for which a stability criterion is provided for both strategic stability and cognitive stability of equilibria based on hyper NE. Additionally, one of interesting directions is information deceptions of the follower, that is, the follower is inclined to deceive the follower by sending misinformation, such as fake payoffs, to the leader in order to benefit itself as much as possible, while, at the same time, the leader needs to distinguish the deception information for minimizing its loss incurred by the deception. Recently, an interesting result on the nexus between the follower’s deception and the leader’s maximin utility is obtained for optimally deceiving the leader in [182], that is, through deception, almost any (fake) Stackelberg equilibrium can be induced by the follower if and only if the leader procures at least their maximin utility at this equilibrium.
IiiC ZeroSum Differential Games
According to the existing literature, zerosum DGs are categorized by five main dimensions, which however are not mutually exclusive, but from different angles of studied problems, i.e., linearquadratic DGs, DGs with nonlinear dynamical systems, Stackelberg DGs, stochastic DGs, and terminal time and state constraint.

LinearQuadratic DGs. This relatively simple model has been widely studied for DGs, where dynamical systems are linear differential equations and cost functions are quadratic [183, 184]. In general, linearquadratic DGs are analytically and numerically solvable, which can find a variety of applications in reality, such as pursuitevasion problem [185, 186]. Recently, singular linearquadratic DGs were studied in [187], which cannot be handled either using the Isaacs MinMax principle or the BellmanIsaacs equation approach, and to solve this problem, an interception differential game was introduced with appropriate regularized cost functional and dual representation. The authors in [188] studied a linearquadraticGaussian asset defending differential game where the state information of the attacker and the defender is not accessible to each other, but the trajectory of a moving asset is known by them. Meanwhile, a twoplayer linearquadraticGaussian pursuitevasion DG was investigated in [189] with partial information and selected observations, where the state of one player can be observed any time preferred by the other player and the cost function of each player consists of the direct cost of observing and the implicit cost of exposing his state. A linearquadratic DG with two defenders and two attackers against a stationary target was considered in [190]. Twoplayer meanfield linearquadratic stochastic DGs in an infinite horizon was investigated in [191], where the existence of both openloop and closedloop saddle points is studied by resorting to coupled generalized algebraic Riccati equations.

Nonlinear DGs. The DGs with nonlinear state dynamics have also been taken into account in the literature, given that many practical applications cannot be dealt with by linearquadratic DGs. For example, the authors in [192] considered a class of nonlinear TPZSDGs by appealing to an adaptive dynamic programming. TPZSDGs were addressed in [193] by proposing an approximate optimal critic learning algorithm based on policy iteration of a single neural network. Nonlinear DGs were also considered with time delays [194, 195, 196] and fractionalorder systems [197], and then were studied in [198] with the dynamical system depending on the system’s distribution and the random initial condition. Besides two players, multiplayer zerosum DGs with uncertain nonlinear dynamics were considered and tackled using a new iterative adaptive dynamic programming algorithm in [199].

Stackelberg DGs. Motivated by the fact of sequential actions in some practical applications, like Stackelberg games, DGs with sequential actions, called Stackelberg DGs, have been broadly addressed in the literature. For instance, a linearquadratic Stackelberg DG was considered in [200] with mixed deterministic and stochastic controls, where the follower can select adapted random processes as its controller. The Stackelberg DG was employed to fight terrorism in [201]. Then, the authors in [202] investigated two classes of stateconstrained Stackelberg DGs with a nonzero running cost and state constraint, for which HamiltonJacobi equations are established.

Stochastic DGs. In many realistic problems, the dynamics of a concerned system may not be completely modelled, but undergoing some uncertainties and/or noises, and thereby, stochastic differential equations have been leveraged to model the system dynamics in stochastic DGs [203, 204]. In this respect, the authors in [205] considered twoperson zerosum stochastic linearquadratic DGs, along with the investigation of the openloop saddle point and the openloop lower and upper values. A class of stochastic DGs with ergodic payoff were studied in [206], where it is not necessary for the diffusion system to be nondegenerate. In addition, linearquadratic stochastic Stackelberg DGs were taken into consideration in [207] with asymmetric roles for players, [208] for jumpdiffusion systems, [209] without the solvability assumption on the associated Riccati equations, and [210] with model uncertainty. And a Stackelberg stochastic DG with nonlinear dynamics and asymmetric noisy observation was addressed in [211].

Terminal Time and State Constraint. A basic classification of zerosum DGs can be made based on terminal time and state constraint, that is, whether the terminal time is finite (including two cases, i.e., a fixed constant or a variable to be specified) or infinite, and whether the system state is unconstrained or constrained. Along this line, the case with fixed terminal time and unconstrained state was first addressed [212], and the stateconstrained case with fixed terminal time was also studied [213]. Meanwhile, the case with the terminal time being a variable was investigated in the literature, such as [214] without state constraints and [215, 216] in presence of state constraints but with zero runningcost. Recently, the case with nonzero state constraint and underdetermined terminal time was investigated in [202]. Besides the above finite horizon cases, the infinite horizon case has also been considered in the literature, e.g., [191, 217].
Iv Prevailing Algorithms and Approaches
This section aims at encapsulating some main efficient algorithms and approaches for handling the reviewed adversary games as discussed in Section II.
Iva ZeroSum Normal and ExtensiveForm Games
The bundle of algorithms can be roughly divided into two parts according to their applicabilities to normalform games or imperfectinformation extensiveform games.
For normalform games, a large number of algorithms have so far been proposed, e.g., regret matching (RM for short, first proposed by Hart and MasColell in 2000 [220]), RM+ [221], fictitious play [222, 223], double oracle [224], online double oracle [49], and among others. Wherein, the most prevalent algorithms are based on regret learning, usually called noregret (or sublinear) learning algorithms, depending external and internal regrets in general, as defined below.
The external regret and internal regret [225] for each player are, respectively, defined as
(15)  
(16) 
where the superscript stands for the iteration number, is the time horizon, and is the indicator function with an event . Generally speaking, the external regret measures the greatest regret for not playing actions ’s, and the internal regret indicates the greatest regret for not swapping to action when each time actually playing action . Note that weighted external and internal regrets are also defined by adding a weight at each time [226], and other regrets are considered as well in the literature, including swap regret [91] and several dynamic/static NEbased regrets [227, 228, 229, 230, 17].
With regrets at hand, it is now ready to present two of most widely employed algorithms, i.e., optimistic (or predictive) follow the regularized leader (Optimistic FTRL for brevity) and optimistic mirror descent (OMD for short) [85], which are, respectively, given as
(17) 
and
(18) 
where is a generic closed convex constraint set, is the stepsize, is a subgradient of a function returned by the environment after the player commits an action at time , is a subgradient prediction, often assuming in the literature, and is a strongly convex function, serving as the base function for defining the Bregman divergence for any .
Note that many widely employed algorithms, such as optimistic gradient descent ascent (OGDA) [76] and optimistic multiplicative weights update (OMWU, or optimistic hedge) [231], are special cases or variants of optimistic FTRL and OMD, and other different efficient algorithms also exist such as optimistic dual averaging (OptDA) [232], greedy weights [226], and so forth.
For imperfectinformation games, the most popular algorithms are counterfactual regret minimization (CFR) [233], whose details are introduced as follows, with the same notations as in extensiveform games in Section IIA.
Recalling that denotes the reach probability of history with strategy profile . For an infoset , let denote the probability of reaching the infoset via all possible histories in , i.e., . And denote by the reach probability of infoset for player according to the strategy , i.e., , and the counterfactual reach probability of infoset , i.e., the probability of reaching with strategy profile except that the probability of reaching is treated as by the current actions of player , i.e., without the contribution of player to reach . Meanwhile, denotes the probability of going from history to a nonterminal node . Then, for player , the counterfactual value at a nonterminal history is defined as
(19) 
the counterfactual value of an infoset is defined as
(20) 
and the counterfactual value of an action is defined as
(21) 
The instantaneous regret at iteration and counterfactual regret at iteration for action in infoset are, respectively, defined as
(22)  
(23) 
where is the joint strategy profile leveraged at iteration .
By defining , applying regret matching by Hart and MasColell [220] can generate the strategy update as
(24) 
with , and (24) is the essential CFR method for player ’s strategy selection. Moreover, it is known that the CFR method can guarantee the convergence to NEs for the average strategy of players, i.e.,
(25) 
Hitherto, various famous variants of CFR have been developed with superior performance, including CFR+ [221, 234], discounted CFR (DCFR) [235], linear CFR (LCFR) [236], exponential CFR (ECFR) [237], AutoCFR [238], etc. More details can be found in [239, 14, 112].
Meanwhile, lots of AI methods have been brought forward in the literature [93], such as policy space response oracles (PSRO) [21, 240], neural fictitious selfplay [127], deep CFR [236], single deep CFR [241], unified deep equilibrium finding (UDEF) [136], player of games (PoG) [133], neural autocurricula (NAC) [137], and so forth. Among these methods, PSRO has been an effective approach in recent years, which unifies fictitious play and double oracle algorithms. Nonetheless, UDEF provides a unified framework of PSRO and CFR, which are generally considered independently with their own advantages, and thus UDEF are superior to both PSRO and CFR as demonstrated by experiments on Leduc poker [136]. The recentlydeveloped PoG algorithm has unified several previous approaches by integrating guided search, selfplay learning, and gametheoretic reasoning, and demonstrated theoretically and experimentally the achievement of strong empirical performance in large perfect and imperfect information games, which defeats stateoftheart in headsup nolimit Texas Hold’em poker (Slumbot) [133]. Moreover, NAC, as a metalearning algorithm proposed recently in [137], provides a potential future direction to develop general multiagent reinforcement learning (MARL) algorithms solely from data, since it can learn its own objective solely from the interactions with environment, without the need of humandesigned knowledge about game theoretic principles, and it can decide by itself what the metasolution, i.e., who to compete with, should be during training. Furthermore, it is shown that NAC is comparable or even superior to the stateoftheart populationbased game solvers, such as PSRO, on a series of games, like Games of Skill, differentiable Lotto, nontransitive Mixture Games, Iterated Matching Pennies, and Kuhn poker [137].
Finally, it is worth pointing out that by CFR methods, it can guarantee the convergence to NEs in the sense of the empirical distribution (i.e., timeaverage) of play, but generally failing to converge for the daytoday play (i.e., the lastiterate convergence) [242, 243], although it does converge in the sense of lastiterate in twoplayer zerosum games [85]. In this respect, the lastiterate convergence is of also importance to be explored as demonstrated in economics, and so on [244, 245, 246, 76, 85].
IvB Stackelberg Games
GSGs and SSGs can be expressed as bilevel linear programming (BLP) or mixed integer linear programming (MILP), which can be further transformed or relaxed as linear programming (LP) [146]. As mentioned in Section IIIB, solving GSGs and SSGs is generally NPhard, and most existing solution methods are variants of solution approaches for MILP and LP, including cutting plane methods, enumerative methods, hybrid methods, and so on [147]. Some of most widely used approaches in the literature are introduced in the sequel.

Multiple LP Approach. This approach is proposed in [38], most widely employed for those easy problems that can be solved in polynomial time, including the case with a single follower type for GSGs [38], further improved upon in [247] by merging LPs into a single MILP. And this approach has also been improved to deal with SSGs in [164], generally pretty efficient in the case with size of the schedule and the case with size of the schedule but for homogeneous resources, as shown in Table II.

Benders Decomposition. Benders decomposition method is developed in [248], which is effective to handle general MILP problems. The crux of this method is to divide the original problem into two other problems, that is, one is called master problem by relaxing some constraints and the other is called subproblem, along with a separation problem that is the dual of the subproblem. Then, the solution seeking procedure involves the solving of the master problem firstly, followed by solving the separation problem, and finally checking the feasibility and optimality conditions for the subproblem with different contingent operations. Moreover, this approach can be improved upon by combining with other techniques, such as Farkas’ lemma [249] and normalized cut [250], leading to a recent efficient algorithm, called normalized Benders decomposition [147], etc.

Branch and Cut. Branch & cut method, as a hybrid methods, combines the cutting plane method [251] with the branch and bound method [252]. This approach is pretty effective for resolving various (mixed) integer programming problems while still ensuring the optimality. In general, branch and cut algorithm is in the same spirit of the branch and bound scheme, but appending new constraints when necessary in each node by resorting to cutting plane approaches [147].

Cut and Branch. This method is similar to the branch and cut approach, and the difference lies in that the extra cuts are only added in the root node. Meanwhile, only the branching constraints are added to the other nodes. It is found in [147] that with variables in in master problem and stabilization, cut and branch is superior to other methods in some sense.

Gradient Descent Ascent. Gradient descent ascent, i.e., the classical gradient descent and ascent algorithm [253], is the most noticeable algorithm for solving continuous Stackelberg games, where descent and ascent operations are, respectively, performed for the leader and the follower, but in a sequential order, and other methods mostly rest on this algorithm [82, 177]. For example, the maxoracle gradientdescent algorithm [82] is a variant of gradient descent ascent, where the ascent operation in the follower is directly replaced with an approximate best response provided by a maxoracle.
IvC ZeroSum Differential Games
Among the methods for solving zerosum DGs, the viscosity solution approach is the most widely exploited one, for which it is known that a value function is the solution of the HamiltonJacobiIsaacs (HJI) equations. In the sequel, this approach is introduced for DGs (10) and (11), and other detailed cases can be found in [45, 256].
For DGs (10) and (11), the Hamiltonian is defined as
(26) 
and the HJI equation is given as
(27) 
where the second condition is called the terminal condition, is a function, and represent the subgradients with respect to , respectively.
Let denote the set of functions satisfying the continuity condition in and the Lipschitz condition on every bounded subset of in . From [195], it is known that if a function is coinvariantly differentiable at each point , satisfies HJI equation (27), and , then is the value function of differential game (10) and (11), and the optimal control strategies for two players are given as
(28) 
where
(29) 
Moreover, it should be noted that AI methods have also been applied to solve differential games, for example, reinforcement learning was employed to deal with multiplayer nonlinear differential games [257], where a novel twolevel value iterationbased integral reinforcement learning algorithm was proposed only depending upon partial information of system dynamics.
V Applications
This section provides some practical applications for adversarial games. As a matter of fact, adversarial games have been leveraged to solve a large volume of realistic problems in the literature, as illustrated in Fig. 5, including poker [133], StarCraft [258], politics [259], infrastructure security [13], pursuitevasion problems [186], border defense [170, 260, 19], national defense [18], communication scheduling [261], autonomous driving [262], homeland security [263], etc. In what follows, we provide three well known examples to illustrate applications.
Example 1 (Radar Jamming).
Radar jamming is one of widely studied applications of zerosum games in modern electronic warfare [264, 265]. In radar jamming, there exist two players, one radar who aims to detect a target in a probability as high as possible, and one jammer who aims at minimizing the radar’s detection by jamming it. Therefore, the two players are diametrically opposed, and the scenario forms a twoplayer zerosum game (cf. Fig. 6 for a schematic illustration). Usually, according to the type of the target, some kinds of utility functions can be constructed in distinct scenarios of jamming, and some constraints can be described mathematically relying on physical limitations, such as jammer power, spatial extent of jamming, and threshold parameter and reference window size for the radar. For example, a Swerling Type II target is assumed in [266] in presence of Rayleigh distributed clutter, for which certain utility functions are built for cell averaging and orderstatistic constant false alarm rate (CFAR) processors in three scenarios of jamming, i.e., ungated range noise, rangegated noise, and falsetarget jamming.
Example 2 (Border Patrols).
It is an important task for a country to secure national borders to avoid illicit behaviors of drugs, contraband, and stowaway, etc. In this spirit, border patrols are introduced here as one application of SSGs, which is proposed by Carabineros de Chile [170, 171], to thwart drug trafficking, contraband and illegal entry. To this end, both day and night shift patrols along the border are arranged by Carabineros according to distinct requirements.
The night shift patrols are specially focused on. To make it practically implementable, the region is partitioned into some police precincts, some of which are paired up when scheduling the patrol, because of the vast expanses and harsh landscape at the border and the manpower limitation. In addition, a set of vantage locations have been identified by Carabineros along the border of the region, which are suited for conducting surveillance with hightech equipments, like heat sensors and night goggles. A night shift action means the deployment of a joint detail with personnel from two paired precincts to carry out vigilance overnight at the vantage locations within the realm of the paired precincts. Meanwhile, in consideration of logistical constraints, a joint detail is deployed for every precinct pair to a surveillance location once a week. Fig. 7 illustrates the case with pairings, precincts and locations.
Example 3 (PursuitEvasion Problems).
Pursuitevasion problems are one of prevalent applications of zerosum DGs, which have been widely applied to many practical problems, such as surveillance and navigation, in robotics and aerospace and so forth. In pursuitevasion problems, there usually exist a collection of pursuers and evaders (one pursuer and one evader in the simplest case) possibly with a moving target or stationary target set/area, and the pursuers aim to capture or intercept the evaders who have opposed objectives [186]. As a concrete example, consider a case where there exists one pursuer (or defender) which protects a maritime coastline or border from the attacking by two slower aircraft (or evaders). The pursuer needs to sequentially pursue the evaders and strives to intercept them as far as possible from the coastline. Meanwhile, the two evaders can collaborate and strive to minimize their combined distance to the coastline before they are intercepted. For this problem, a regular solution was provided for the differential game in [267].
Vi Possible Future Directions
In view of some challenges in adversary games, this section attempts to present potential research directions in future, as discussed in the sequel.

Efficient Algorithms Design. Even though a wide range of algorithms have been proposed in the literature, as introduced above, efficient, fast and optimal algorithms with limited computing, storage, and memory capabilities are still the overarching research directions in (adversarial) games and artificial intelligence, which are far from fully explored, including a plethora of scenarios, e.g., equilibrium computation [226], realtime strategy (RTS) making [268], exploiting suboptimal opponents [269], attack resiliency [270], and so forth.

LastIterate Convergence. In general, noregret learning can guarantee the convergence of the empirical distribution of play (i.e., timeaverage convergence) for each player to the set of NEs. However, the lastiterate convergence fails in general [242, 243], although restricted classes of games indeed have the lastiterate convergence by noregret learning algorithms, such as twoplayer zerosum games [85]
. Note that the lastiterate convergence is important in many practical applications, for example, generative adversarial networks (GANs)
[271] and economics [231], which have been receiving a growing interest in recent years [272]. 
Large Games. For adversarial games with large action spaces and/or infosets, practical limitations, such as limited computing resources, impose the need of efficient algorithms design amenable to implementation with limited computation, storage and even communication [274].

Incomplete Information. Incomplete information is another main hallmark of many adversarial games, which is one of challenge sources. Generally speaking, game uncertainties, such as parameter uncertainties, action outcome uncertainty, underlying world state uncertainty, can be subsumed in the category of incomplete information, and the main studied models are Bayesian and interval models [275, 145, 276].

Bounded Rationality. Completely rational players are often assumed in the study of games. Nonetheless, irrational players naturally appear in practice, which has triggered an increasing interest in games with bounded rationality, e.g., behavior models such as lensQR models, prospect theory inspired models and quantal response models [277, 278, 279].

Dynamic Environments. Most of games have been investigated as static ones, i.e., with timeinvariant game rules. However, due to possible dynamic characteristics of the environment within which players compete, online game (or timevarying game) is imperative for further attention in future, where each player’s utility function is timevarying or even adversarial without any distribution assumptions [227, 228, 229, 230, 17].

Hybrid Games. It is known that many realistic adversarial games involve both continuous and discrete physical dynamics that govern players’ motion or changing rules, which can be framed in the framework of hybrid games [280, 281]. In this respect, how to combine the game theory with control dynamics is an important yet challenging research area.

AI in Games. Recent years have witnessed great progress in the success of AI methods applied in games, which can integrate some advanced approaches of reinforcement learning, neural networks, metalearning, and so on [282, 283, 135, 284]. With the advent of modern hightech and bigdata complex missions, AI methods provide an effective manner to commit realtime strategies by solely exploiting offline or realtime streaming data [139].
Vii Conclusion
Adversarial games play a significant role in practical applications, for which this survey provided a systematic overview on it from three main categories, i.e., zerosum normal and extensiveform games, Stackelberg (security) games, and zerosum differential games. To this end, several distinct angles have been employed to anatomize adversarial games, ranging from game models, solution concepts, problem classification, research frontiers, prevailing algorithms and realworld applications to potential future directions. In general, this survey has attempted to summarize the past research in an intact manner, although the existing references are too vast to cover in its entirety. To our best knowledge, this survey is the first to present a systematic overview on adversarial games. Finally, future possible directions have been also discussed.
References
 [1] J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior, 2nd ed. Princeton University Press, 1947.
 [2] J. F. Nash, “Equilibrium points in person games,” Proceedings of the National Academy of Sciences, vol. 36, no. 1, pp. 48–49, 1950.
 [3] J. Nash, “Noncooperative games,” Annals of Mathematics, vol. 54, no. 2, pp. 286–295, 1951.
 [4] D. Fudenberg and J. Tirole, Game Theory. MIT Press, 1991.
 [5] M. J. Osborne and A. Rubinstein, A Course in Game Theory. MIT Press, 1994.
 [6] T. Başar and G. Zaccour, Handbook of Dynamic Game Theory. Springer International Publishing, 2018.
 [7] R. J. Aumann, M. Maschler, and R. E. Stearns, Repeated Games with Incomplete Information. MIT Press, 1995.
 [8] N. Bard, J. Hawkin, J. Rubin, and M. Zinkevich, “The annual computer poker competition,” AI Magazine, vol. 34, no. 2, pp. 112–112, 2013.
 [9] T. H. Nguyen, D. Kar, M. Brown, A. Sinha, A. X. Jiang, and M. Tambe, “Towards a science of security games,” in Mathematical Sciences with Multidisciplinary Applications, 2016, pp. 347–381.
 [10] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
 [11] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017.
 [12] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel et al., “A general reinforcement learning algorithm that masters chess, shogi, and Go through selfplay,” Science, vol. 362, no. 6419, pp. 1140–1144, 2018.
 [13] A. Sinha, F. Fang, B. An, C. Kiekintveld, and M. Tambe, “Stackelberg security games: Looking beyond a decade of success,” in International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 2018, pp. 5494–5501.
 [14] H. Li, X. Wang, F. Jia, Y. Li, and Q. Chen, “A survey of Nash equilibrium strategy solving based on CFR,” Archives of Computational Methods in Engineering, vol. 28, no. 4, pp. 2749–2760, 2021.
 [15] M. K. Sohrabi and H. Azgomi, “A survey on the combined use of optimization methods and game theory,” Archives of Computational Methods in Engineering, vol. 27, no. 1, pp. 59–80, 2020.
 [16] J. Wang, Y. Hong, J. Wang, J. Xu, Y. Tang, Q.L. Han, and J. Kurths, “Cooperative and competitive multiagent systems: From optimization to games,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 5, pp. 763–783, 2022.
 [17] X. Li, L. Xie, and N. Li, “A survey of decentralized online learning,” arXiv preprint arXiv:2205.00473, 2022.
 [18] E. Ho, A. Rajagopalan, A. Skvortsov, S. Arulampalam, and M. Piraveenan, “Game theory in defence applications: A review,” Sensors, vol. 22, no. 3, p. 1032, 2022.
 [19] D. Shishika and V. Kumar, “A review of multiagent perimeter defense games,” in International Conference on Decision and Game Theory for Security, College Park, USA, 2020, pp. 472–485.
 [20] M. Zhu, A. H. Anwar, Z. Wan, J.H. Cho, C. A. Kamhoua, and M. P. Singh, “A survey of defensive deception: Approaches using game theory and machine learning,” IEEE Communications Surveys & Tutorials, vol. 23, no. 4, pp. 2460–2493, 2021.
 [21] M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Pérolat, D. Silver, and T. Graepel, “A unified gametheoretic approach to multiagent reinforcement learning,” in Advances in Neural Information Processing Systems, vol. 30, Long Beach, CA, USA, 2017.
 [22] M. L. Littman, “Markov games as a framework for multiagent reinforcement learning,” in Machine Learning Proceedings, 1994, pp. 157–163.
 [23] S. Zamir et al., “Bayesian games: Games with incomplete information,” Tech. Rep., 2008.
 [24] X. Chen, X. Deng, and S.H. Teng, “Settling the complexity of computing twoplayer Nash equilibria,” Journal of the ACM (JACM), vol. 56, no. 3, pp. 1–57, 2009.
 [25] C. Daskalakis, P. W. Goldberg, and C. H. Papadimitriou, “The complexity of computing a Nash equilibrium,” SIAM Journal on Computing, vol. 39, no. 1, pp. 195–259, 2009.
 [26] A. Rubinstein, Hardness of Approximation Between P and NP. Morgan & Claypool, 2019.
 [27] R. J. Aumann, “Subjectivity and correlation in randomized strategies,” Journal of Mathematical Economics, vol. 1, no. 1, pp. 67–96, 1974.
 [28] J. Hannan, “Approximation to Bayes risk in repeated play,” Contributions to the Theory of Games, vol. 3, no. 2, pp. 97–139, 1957.
 [29] J. V. Neumann, “Zur theorie der gesellschaftsspiele,” Mathematische Annalen, vol. 100, no. 1, pp. 295–320, 1928.
 [30] G. Farina, T. Bianchi, and T. Sandholm, “Coarse correlation in extensiveform games,” in AAAI Conference on Artificial Intelligence, vol. 34, no. 2, 2020, pp. 1934–1941.
 [31] A. Celli, S. Coniglio, and N. Gatti, “Computing optimal coarse correlated equilibria in sequential games,” arXiv preprint arXiv:1901.06221, 2019.
 [32] A. Celli and N. Gatti, “Computational results for extensiveform adversarial team games,” in AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
 [33] B. von Stengel and D. Koller, “Teammaxmin equilibria,” Games and Economic Behavior, vol. 21, no. 12, pp. 309–321, 1997.
 [34] S. Omidshafiei, C. Papadimitriou, G. Piliouras, K. Tuyls, M. Rowland, J.B. Lespiau, W. M. Czarnecki, M. Lanctot, J. Perolat, and R. Munos, “rank: Multiagent evaluation by evolution,” Scientific Reports, vol. 9, no. 1, pp. 1–29, 2019.
 [35] H. Von Stackelberg, Marktform und gleichgewicht. SpringerVerlag, Berlin, 1934.
 [36] B. An, F. Ordóñez, M. Tambe, E. Shieh, R. Yang, C. Baldwin, J. DiRenzo III, K. Moretti, B. Maule, and G. Meyer, “A deployed quantal responsebased patrol planning system for the U.S. coast guard,” Interfaces, vol. 43, no. 5, pp. 400–420, 2013.
 [37] C. Casorrán, B. Fortz, M. Labbé, and F. Ordóñez, “A study of general and security Stackelberg game formulations,” European Journal of Operational Research, vol. 278, no. 3, pp. 855–868, 2019.
 [38] V. Conitzer and T. Sandholm, “Computing the optimal strategy to commit to,” in Proceedings of the 7th ACM conference on Electronic Commerce, Michigan, USA, 2006, pp. 82–90.
 [39] G. Leitmann, “On generalized Stackelberg strategies,” Journal of Optimization Theory and Applications, vol. 26, no. 4, pp. 637–643, 1978.
 [40] H. von Stackelberg, Market Structure and Equilibrium. Springer Science & Business Media, 2011.
 [41] T. Başar and G. J. Olsder, Dynamic Noncooperative Game Theory. SIAM, 1998.
 [42] R. Isaacs, Differential Games. Wiley, New York, 1965.
 [43] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control. John Wiley & Sons, 2012.
 [44] R. Buckdahn, P. Cardaliaguet, and M. Quincampoix, “Some recent aspects of differential game theory,” Dynamic Games and Applications, vol. 1, no. 1, pp. 74–114, 2011.
 [45] A. Friedman, Differential Games. Courier Corporation, 2013.
 [46] J. Garg, A. X. Jiang, and R. Mehta, “Bilinear games: Polynomial time algorithms for rank based subclasses,” in International Workshop on Internet and Network Economics, Singapore, 2011, pp. 399–407.
 [47] C. E. Lemke and J. T. Howson, Jr, “Equilibrium points of bimatrix games,” Journal of the Society for Industrial and Applied Mathematics, vol. 12, no. 2, pp. 413–423, 1964.
 [48] I. Anagnostides and P. Penna, “Solving zerosum games through alternating projections,” arXiv preprint arXiv:2010.00109, 2021.
 [49] L. C. Dinh, Y. Yang, Z. Tian, N. P. Nieves, O. Slumbers, D. H. Mguni, H. B. Ammar, and J. Wang, “Online double oracle,” arXiv preprint arXiv:2103.07780, 2021.
 [50] A. Murhekar, “Approximate Nash equilibria of imitation games: Algorithms and complexity,” in International Conference on Autonomous Agents and Multiagent Systems, 2020, pp. 887–894.
 [51] E. Borel, “La théorie du jeu et les équations intégralesa noyau symétrique,” Comptes rendus de l’Académie des Sciences, vol. 173, no. 13041308, p. 58, 1921.
 [52] J. T. Howson Jr, “Equilibria of polymatrix games,” Management Science, vol. 18, no. 5part1, pp. 312–318, 1972.
 [53] G. Sengodan and C. Arumugasamy, “Linear complementarity problems and bilinear games,” Applications of Mathematics, vol. 65, no. 5, pp. 665–675, 2020.
 [54] A. Deligkas, M. Fasoulakis, and E. Markakis, “A polynomialtime algorithm for approximate Nash equilibria in bimatrix games,” arXiv preprint arXiv:2204.11525, 2022.
 [55] A. Deligkas, J. Fearnley, and R. Savani, “Tree polymatrix games are PPADhard,” arXiv preprint arXiv:2002.12119, 2020.
 [56] S. Seddighin, “Campaigning via LPs: Solving Blotto and Beyond,” Ph.D. dissertation, University of Maryland, College Park, 2019.
 [57] R. Mehta, “Constant rank twoplayer games are PPADhard,” SIAM Journal on Computing, vol. 47, no. 5, pp. 1858–1887, 2018.
 [58] S. Boodaghians, J. Brakensiek, S. B. Hopkins, and A. Rubinstein, “Smoothed complexity of player Nash equilibria,” in Annual Symposium on Foundations of Computer Science, 2020, pp. 271–282.
 [59] S. Behnezhad, A. Blum, M. Derakhshan, M. Hajiaghayi, C. H. Papadimitriou, and S. Seddighin, “Optimal strategies of Blotto games: Beyond convexity,” in Proceedings of ACM Conference on Economics and Computation, Phoenix, AZ, USA, 2019, pp. 597–616.
 [60] S. Behnezhad, S. Dehghani, M. Derakhshan, M. Hajiaghayi, and S. Seddighin, “Fast and simple solutions of Blotto games,” Operations Research, DOI: 10.1287/opre.2022.2261, 2022.
 [61] D. Beaglehole, “An efficient approximation algorithm for the Colonel Blotto game,” arXiv preprint arXiv:2201.10758, 2022.
 [62] V. Leon and S. R. Etesami, “Bandit learning for dynamic Colonel Blotto game with a budget constraint,” arXiv preprint arXiv:2103.12833, 2021.
 [63] D. Q. Vu, P. Loiseau, and A. Silva, “Approximate equilibria in generalized Colonel Blotto and generalized Lottery Blotto games,” arXiv preprint arXiv:1910.06559, 2019.
 [64] E. BoixAdserà, B. L. Edelman, and S. Jayanti, “The multiplayer Colonel Blotto game,” Games and Economic Behavior, vol. 129, pp. 15–31, 2021.
 [65] E.V. VlatakisGkaragkounis, L. Flokas, and G. Piliouras, “Poincaré recurrence, cycles and spurious equilibria in gradientdescentascent for nonconvex nonconcave zerosum games,” in Advances in Neural Information Processing Systems, vol. 32, Vancouver, BC, Canada, 2019, pp. 1–12.
 [66] G. Zhang, Y. Wang, L. Lessard, and R. B. Grosse, “Nearoptimal local convergence of alternating gradient descentascent for minimax optimization,” in International Conference on Artificial Intelligence and Statistics, 2022, pp. 7659–7679.
 [67] E. Y. Hamedani and N. S. Aybat, “A primaldual algorithm with line search for general convexconcave saddle point problems,” SIAM Journal on Optimization, vol. 31, no. 2, pp. 1299–1329, 2021.
 [68] V. Tominin, Y. Tominin, E. Borodich, D. Kovalev, A. Gasnikov, and P. Dvurechensky, “On accelerated methods for saddlepoint problems with composite structure,” arXiv preprint arXiv:2103.09344, 2021.
 [69] G. Xie, Y. Han, and Z. Zhang, “DIPPA: An improved method for bilinear saddle point problems,” arXiv preprint arXiv:2103.08270, 2021.
 [70] D. Kovalev, A. Gasnikov, and P. Richtárik, “Accelerated primaldual gradient method for smooth and convexconcave saddlepoint problems with bilinear coupling,” arXiv preprint arXiv:2112.15199, 2021.
 [71] K. K. Thekumparampil, N. He, and S. Oh, “Lifted primaldual method for bilinearly coupled smooth minimax optimization,” arXiv preprint arXiv:2201.07427, 2022.
 [72] G. Gidel, T. Jebara, and S. LacosteJulien, “FrankWolfe algorithms for saddle point problems,” in International Conference on Artificial Intelligence and Statistics, Florida, USA, 2017, pp. 362–371.
 [73] C. Chen, L. Luo, W. Zhang, and Y. Yu, “Efficient projectionfree algorithms for saddle point problems,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 10 799–10 808.
 [74] H. Li, Y. Tian, J. Zhang, and A. Jadbabaie, “Complexity lower bounds for nonconvexstronglyconcave minmax optimization,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 1–13.
 [75] Y.P. Hsieh, P. Mertikopoulos, and V. Cevher, “The limits of minmax optimization algorithms: Convergence to spurious noncritical sets,” in International Conference on Machine Learning, 2021, pp. 4337–4348.
 [76] C.Y. Wei, C.W. Lee, M. Zhang, and H. Luo, “Linear lastiterate convergence in constrained saddlepoint optimization,” in International Conference on Learning Representations, 2021, pp. 1–12.
 [77] I. Bistritz, Z. Zhou, X. Chen, N. Bambos, and J. Blanchet, “No weightedregret learning in adversarial bandits with delays,” Journal of Machine Learning Research, vol. 23, pp. 1–43, 2022.
 [78] T. Fiez, R. Sim, S. Skoulakis, G. Piliouras, and L. Ratliff, “Online learning in periodic zerosum games,” vol. 34, 2021, pp. 1–13.
 [79] H. Gao, X. Wang, L. Luo, and X. Shi, “On the convergence of stochastic compositional gradient descent ascent method,” in International Joint Conference on Artificial Intelligence, 2021, pp. 1–7.
 [80] A. Beznosikov, G. Scutari, A. Rogozin, and A. Gasnikov, “Distributed saddlepoint problems under data similarity,” vol. 34, 2021.
 [81] E.V. VlatakisGkaragkounis, L. Flokas, and G. Piliouras, “Solving minmax optimization with hidden structure via gradient descent ascent,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 1–14.
 [82] D. Goktas and A. Greenwald, “Convexconcave minmax Stackelberg games,” in Advances in Neural Information Processing Systems, vol. 34, 2021.
 [83] D. Xefteris, “Symmetric zerosum games with only asymmetric equilibria,” Games and Economic Behavior, vol. 89, pp. 122–125, 2015.
 [84] Y. Cai and C. Daskalakis, “On minmax theorems for multiplayer games,” in Proceedings of Annual ACMSIAM Symposium on Discrete Algorithms, San Francisco, California, 2011, pp. 217–234.
 [85] I. Anagnostides, I. Panageas, G. Farina, and T. Sandholm, “On lastiterate convergence beyond zerosum games,” arXiv preprint arXiv:2203.12056, 2022.
 [86] J. P. Bailey, “ timeaverage convergence in a generalization of multiagent zerosum games,” arXiv preprint arXiv:2110.02482, 2021.
 [87] T. Fiez, R. Sim, S. Skoulakis, G. Piliouras, and L. Ratliff, “Online learning in periodic zerosum games: von Neumann vs Poincaré.”
 [88] S. Skoulakis, T. Fiez, R. Sim, G. Piliouras, and L. Ratliff, “Evolutionary game theory squared: Evolving agents in endogenously evolving zerosum games,” in AAAI Conference on Artificial Intelligence, 2021, pp. 1–9.
 [89] E. Hughes, T. W. Anthony, T. Eccles, J. Z. Leibo, D. Balduzzi, and Y. Bachrach, “Learning to resolve alliance dilemmas in manyplayer zerosum games,” arXiv preprint arXiv:2003.00799, 2020.
 [90] S. Ganzfried, “Fast complete algorithm for multiplayer Nash equilibrium,” arXiv preprint arXiv:2002.04734, 2020.
 [91] I. Anagnostides, C. Daskalakis, G. Farina, M. Fishelson, N. Golowich, and T. Sandholm, “Nearoptimal noregret learning for correlated equilibria in multiplayer generalsum games,” arXiv preprint arXiv:2111.06008, 2021.
 [92] I. Anagnostides, G. Farina, C. Kroer, A. Celli, and T. Sandholm, “Faster noregret learning dynamics for extensiveform correlated and coarse correlated equilibria,” arXiv preprint arXiv:2202.05446, 2022.
 [93] G. Gidel, “Multiplayer games in the era of machine learning,” Ph.D. dissertation, Université de Montréal, 2020.
 [94] Y. Zhang and B. An, “Converging to teammaxmin equilibria in zerosum multiplayer games,” in International Conference on Machine Learning, 2020, pp. 11 033–11 043.
 [95] F. Kalogiannis, E.V. VlatakisGkaragkounis, and I. Panageas, “Teamwork makes von Neumann work: Minmax optimization in twoteam zerosum games,” arXiv preprint arXiv:2111.04178, 2021.
 [96] K. A. Hansen, T. D. Hansen, P. B. Miltersen, and T. B. Sørensen, “Approximability and parameterized complexity of minmax values,” in International Workshop on Internet and Network Economics, 2008, pp. 684–695.
 [97] C. Borgs, J. Chayes, N. Immorlica, A. T. Kalai, V. Mirrokni, and C. Papadimitriou, “The myth of the folk theorem,” Games and Economic Behavior, vol. 70, no. 1, pp. 34–43, 2010.
 [98] B. Gharesifard and J. Cortés, “Distributed convergence to Nash equilibria in twonetwork zerosum games,” Automatica, vol. 49, no. 6, pp. 1683–1692, 2013.