In designing a strategy for a multiagent interaction an agent must balance between the assumption that opponents are behaving rationally with the risks that may occur if opponents behave irrationally. Most classic game-theoretic solution concepts, such as Nash equilibrium (NE), assume that all players are behaving rationally (and that this fact is common knowledge). On the other hand, a maximin strategy plays a strategy that has the largest worst-case guaranteed expected payoff; this limits the potential downside against a worst-case and potentially irrational opponent, but can also cause us to achieve significantly lower payoff against rational opponents. In two-player zero-sum games, Nash equilibrium and maximin strategies are equivalent (by the minimax theorem), and these two goals are completely aligned. But in non-zero-sum games and games with more than two players, this is not the case. In these games we can potentially obtain arbitrarily low payoff by following a Nash equilibrium strategy, but if we follow a maximin strategy will likely be playing far too conservatively. While the assumption that opponents are exhibiting a degree of rationality, as well as the desire to limit worst-case performance in the case of irrational opponents, are both desirable, neither the Nash equilibrium nor maximin solution concept is definitively compelling on its own.
We propose a new solution concept that balances between these two extremes. In a two-player general-sum game, we define an -safe equilibrium (-SE) as a strategy profile where each player is playing a strategy that minimizes performance of the opponent with probability , and is playing a best response to the opponent’s strategy with probability , where As a special case, if we are interested in constructing a strategy for player 1, we can set , assuming irrationality just for player 2. We can generalize this to an -player game by assuming that all players are playing a strategy that minimizes player 1’s expected payoff with probability , and are playing a best response to all other players’ strategies with probability , while player 1 plays a best response to all other players’ strategies. This concept balances explicitly between the assumption of players’ rationality and the desire to ensure safety in the worst case through the parameters. From a theoretical perspective we show that an -safe equilibrium is always guaranteed to exist and is PPAD-hard to compute (assuming Thus, it has the same existence and complexity results as Nash equilibrium.
Several other game-theoretic solution concepts have been previously proposed to account for degrees of opponents’ rationality. The most prominent is trembling-hand perfect equilibrium (THPE), which is a refinement of Nash equilibrium that is robust to the possibility that players “tremble” and play each pure strategy with arbitrarily small probability . The concept of -safe equilibrium differs from THPE in several key ways. First, it allows a player to specify an arbitrary belief on the probability that each other player is irrational, rather than assume that it is an extremely small value. In domains like national security or driving we risk losing lives in the event that we fail to properly account for opponents’ irrationality, and may elect to use larger values for than in situations where safety is less of a concern. In an -SE a player can specify the values for based on prior beliefs about the opponent or any relevant domain-specific knowledge, and is still free to use values that are extremely close to 0 as in THPE. Furthermore, a THPE is a refinement of NE, while -SE and NE are incomparable (an -SE may not be an NE and vice versa). Another related concept is that of a safe strategy and -safe strategy . A strategy for a player in a two-player zero-sum game is called safe if it guarantees an expected payoff of at least —the value of the game to the player—in the worse case. Note that this also coincides with the set of minimax, maximin, and Nash equilibrium strategies. A strategy is -safe if it obtains a worst-case expected payoff of at least The concepts of safe and -safe strategies are defined just for two-player zero-sum games, while safe and -safe equilibrium also apply to non-zero-sum and multiplayer games.
We note that a belief of opponents’ “irrationality” does not necessarily indicate that we believe them to be “stupid” or “crazy.” It may simply correspond to a belief that the opponent may have a different model of the game than we do. For example, our analysis may indicate that a successful attack on a location would result in a certain payoff for the opponent, while their analysis indicates a different payoff. In addition to potentially constructing different assessments of their own or other players’ payoffs, opponents may also be “irrational” because they are using an algorithm for computing a Nash equilibrium that is only able to yield an approximation, or just a different Nash equilibrium from what other players have calculated (in fact, these cases do not actually seem to be irrational at all, since computing a Nash equilibrium is computationally challenging and many games have multiple Nash equilibria). If any of these situations arise, then simply following an arbitrary Nash equilibrium strategy runs a risk of an extremely low payoff, and there is potential for significant benefit by ensuring a degree of safety.
An alternative approach for modeling potentially irrational opponents is to incorporate an opponent modeling algorithm
. Opponent modeling algorithms typically require the use of domain-specific expertise and databases of historical play to construct a prior distribution for opponents’ strategies and use machine learning algorithms that predict a strategy (or distribution of strategies) for the opponents taking into account the prior and observations of publicly observable actions. This can be extremely valuable if domain expertise, large amounts of historical data, and a large number of observations of opponents’ play are available. Often such information is not available and we are forced to construct our strategy without any additional data-specific tendencies of the opponent. We note that if such data is available, the safe equilibrium concept can be integrated with opponent modeling to successfully achieve robust opponent exploitation. An approach called a restricted Nash response was developed for two-player zero-sum games where the opponent is restricted to play a fixed strategydetermined by an opponent model with probability and plays a best response to us with probability while we best respond to the opponent (it is shown that this approach is equivalent to playing an -safe best response to (a best response to out of strategies that are -safe) for some ) . It was shown that for certain values of this approach can result in a significant reduction in the level of exploitability of our own strategy while only a slight reduction in our degree of exploitation of the opponent’s strategy. It has also been shown that approaches that compute an -safe best response to a model of the opponent’s strategy for dynamically changing values of in repeated two-player zero-sum games can guarantee safety . An -safe equilibrium strategy can be used in non-zero-sum and multiplayer games where models are available for the opponents’ strategies by assuming each opponent follows their opponent model with probability instead of playing a worst-case strategy for us, while also playing a best response with probability Thus, in the event that an opponent model is available we can view safe equilibrium as a generalization of restricted Nash response to achieve robust opponent exploitation in the settings of non-zero-sum and multiplayer games.
2 Safe Equilibrium
A strategic-form game consists of a finite set of players , a finite set of pure strategies for each player
, and a real-valued utility for each player for each strategy vector (akastrategy profile), . A mixed strategy for player
is a probability distribution over pure strategies, whereis the probability that player plays pure strategy under . Let denote the full set of mixed strategies for player . A strategy profile is a Nash equilibrium if for all for all , where denotes the vector of the components of strategy for all players excluding player . Here denotes the expected utility for player , and denotes the set of strategy profiles for all players excluding player .
A mixed strategy for player is a maximin strategy if
Let be a two-player strategic-form game. Let , where for A strategy profile is an -safe equilibrium if there exist mixed strategies where for such that , .
In practice player would likely want to set and for when determining their own strategy, though Definition 1 allows an arbitrary value of as well. It may make sense for player to set if they believe both that the opponent is irrational with some probability , and if they also believe that the opponent believes that player is irrational with some probability
Let be a two-player strategic-form game, and let , where Then contains an -safe equilibrium.
Define to be the following game. , , For , define as follows for :
Player 1’s strategy corresponds to , player 2’s strategy corresponds to , player 3’s strategy corresponds to , and player 4’s strategy corresponds to By Nash’s existence theorem, the game has a Nash equilibrium, which corresponds to an -safe equilibrium of ∎
Let be a two-player strategic-form game, and let , where are fixed constants. The problem of computing an -safe equilibrium is PPAD-hard.
Let be a Nash equilibrium of Suppose that is the smallest possible payoff for any player in , and let Define the game as follows. , , For , define as follows for :
Define , , , for Clearly , since is a Nash equilibrium of and is strictly dominated for both players in and can be removed without any effect on best responses. It is also clear that since minimizes each player’s payoff against any possible strategy for the opposing player. This shows that is an -safe equilibrium of Since the problem of computing a Nash equilibrium is PPAD-hard and we have reduced it to the problem of computing an -safe equilibrium, this shows that the problem of computing an -safe equilibrium is PPAD-hard. ∎
For players, we designate one of the players as being a special player, say player 1. Player 1 then best responds to the strategy profile of all other players, while each opposing player mixes between playing a strategy that minimizes player 1’s payoff and a strategy that maximizes player ’s payoff in response to the strategy profile of the other players.
Let be an n-player strategic-form game. Let , where A strategy profile is an -safe equilibrium if there exists a mixed strategy for player 1 and mixed strategies where for such that , , where is the strategy profile for players 2– where player plays and the other players play
The proof of Theorem 1 extends naturally to players as well by creating a player game with 2 new players corresponding to each player in the initial game for , plus player 1.
Let be an -player strategic-form game, and let , where Then contains an -safe equilibrium.
The proof of Theorem 2 also straightforwardly extends to players.
Let be an -player strategic-form game, and let , where are fixed constants. The problem of computing an -safe equilibrium is PPAD-hard.
As an example, consider the classic game of Chicken, with payoffs given by Figure 1. The first action for each player corresponds to the “swerve” action, while the second corresponds to the “straight” action.
The game of chicken models two drivers, both headed for a single-lane bridge from opposite directions. The first to swerve away yields the bridge to the other. If neither player swerves, the result is a costly deadlock in the middle of the bridge, or a potentially fatal head-on collision. It is presumed that the best thing for each driver is to stay straight while the other swerves (since the other is the “chicken” while a crash is avoided). Additionally, a crash is presumed to be the worst outcome for both players. This yields a situation where each player, in attempting to secure their best outcome, risks the worst .
The unique mixed-strategy Nash equilibrium in the Chicken game is for each player to swerve with probability 0.9 (there are also two pure-strategy equilibria where one player swerves and the other player doesn’t), and the unique maximin strategy is to swerve with probability 1. If we set , then it turns out that is an -safe equilibrium strategy for player 1 for , and is an -safe equilibrium strategy for player 1 for
It is not necessary that an -safe equilibrium strategy always corresponds to a Nash equilibrium or maximin strategy. For example, consider the security game depicted in Figure 2, where the row player selects one of three targets to defend while the column player selects a target to attack. A Nash equilibrium for player 1 (row player) is to defend the targets with probabilities , and a maximin strategy is to defend the targets with probabilities Again using , for it turns out that is an -safe equilibrium strategy for player 1, and for is an -safe equilibrium strategy for player 1. But for the region it turns out that the strategy is an -safe equilibrium strategy for player 1, which is neither a Nash equilibrium strategy nor a maximin strategy.
3 Algorithms for computing safe equilibrium
We first present an exact algorithm for computing an -safe equilibrium, followed by an approximation algorithm that runs quickly on large instances. The exact algorithm is based on a mixed-integer feasibility program formulation. We first present the algorithm for two players, for arbitrary The algorithm builds in a related linear mixed-integer feasibility program formulation for computing Nash equilibrium in two-player general-sum games .
We quote from the original description of the program formulation for two-player Nash equilibrium, and present the formulation below:
In our first formulation, the feasible solutions are exactly the equilibria of the game. For every pure strategy
, there is binary variable. If this variable is set to 1, the probability placed on the strategy must be 0. If it is set to 0, the strategy is allowed to be in the support, but the regret of the strategy must be 0. The formulation has the following variables other than the . For each player, there is a variable indicating the highest possible expected utility that that player can obtain given the other player’s mixed strategy. For every pure strategy , there is a variable indicating the probability placed on that strategy, a variable indicating the expected utility of playing that strategy (given the other player’s mixed strategy), and a variable indicating the regret of playing . The constant indicates the maximum difference between two utilities in the game for player : The formulation follows below .
Find such that:
The first four constraints ensure that the values constitute a valid probability distribution and define the regret of a strategy. Constraint 5 ensures that can be set to 1 only when no probability is placed on . On the other hand, Constraint 6 ensures that the regret of a strategy equals 0, unless , in which case the constraint is vacuous because the regret can never exceed . (Technically, Constraint 3 is redundant as it follows from Constraints 4 and 10.) 
We modify this program as follows. For every pure strategy , we include two binary variables . The first one corresponds to player ’s best response strategy, and the second corresponds to player ’s strategy that minimizes the opponent’s payoff. Additionally, we include variables and constants for Given constants , we create the following formulation for computing an -safe equilibrium (note that we have removed the redundant Constraint 3).
Find such that:
For three players, we can use the following formulation, where new variables denote the product of the variables and For the special player 1 we just have superscript 1, while for players 2 and 3 we have superscripts 1 and 2 (corresponding to the best-response strategy and the strategy that is worst-case for player 1). This formulation can be straightforwardly extended to a non-convex quadratically-constrained mixed-integer feasibility program formulation for players, and is based on a recent algorithm for computing multiplayer Nash equilibrium .
Find subject to:
This provides an exact algorithm for computing -safe equilibrium in -player games. Next we consider an approximation algorithm that scales to large games. Two algorithms that have been recently applied to approximate Nash equilibrium in large multiplayer games are (counterfactual) regret minimization  and fictitious play [1, 9]. These are iterative self-play procedures that have been proven to converge to Nash equilibrium in two-player zero-sum games, but not for more than two players. Recently it has been shown that fictitious play outperforms regret minimization for multiplayer games , so we will base our algorithms on fictitious play. Algorithm 1 presents our algorithm for computing -safe equilibrium in two-player games, and Algorithm 2 presents our algorithm for -player games. Note that and are not actually needed in the algorithms (for ), but they will be useful for evaluating the algorithms in our experiments.
This is true for by the definition of Now suppose for all , for some .
For the first set of experiments we investigate the runtime of our exact two-player algorithm as the number of pure strategies per player varies. For these experiments we set , We used an Intel Core i7-8550U at 1.80 GHz with 16 GB of RAM under 64-bit Windows 11 (8 threads). We used Gurobi version 9.5 . We experimented on games with all payoffs uniformly random in [0,1]. For we ran 10,000 trials, and for we ran 1,000. Here refers to the number of pure strategies per player (note that we experiment on games where all players have the same number of pure strategies, while our solution concepts and algorithms also apply to games where the players can have different numbers of pure strategies). The results in Table 1 indicate that the algorithm runs in less than a second for up to .
|Avg. time(s)||Median time(s)|
Next we experimented with the exact three-player algorithm, using Again we used Gurobi 9.5 with 8 cores on a laptop. For these experiments we used Gurobi’s non-convex quadratic solver. For we ran 1,000 trials, and for we ran 100. The results in Table 2 indicate that the algorithm runs in a fraction of a second for and several seconds for
|Avg. time(s)||Median time(s)|
We next experimented with our 2-player approximation algorithm (Algorithm 1). Again we used , For these experiments we just used a single core (note that the algorithm can be parallelized which would result in even lower runtimes). For each value of we ran 10,000 trials, performing 10,000 iterations of Algorithm 1 for each trial. The results in Table 3 indicate that the algorithm runs in just a fraction of a second for
For player define
That is, denotes the difference between the payoff of playing a best response to and following Then define Similarly, define