Since the seminal work of vonNeumann1947, the term rational has become synonymous with expected utility maximization. Whether in game theoretic situations or simply decision-making under uncertainty, the only agent who can be considered rational is the one who attempts to maximize their mean utility, no matter how many trials will likely be necessary for the realized value to resemble the expected value. However, consider an agent faced with multiple options, one of which is an opportunity with maximum expected utility, but it will bankrupt them with high probability if it fails. In the event of failure, consider that the lack of funds will severely limit any future options the agent may have. For such an agent the fact that the opportunity has maximum expected value among the options cannot be the only relevant factor in deciding whether to pursue the opportunity. If the opportunity does not lead to success, the agent will not be able to pursue any later actions, as they will not have the funds necessary to do so. As a result, players should not solely rely on factors such as expected utility and must instead also consider the probability of success for the opportunity.
This observation applies to almost all stochastic decision-making situations, including competitive situations best modeled through game theory. To see this, consider a market composed of only a few large firms and a smaller firm considering how to compete with large firms or whether to even enter the market. We take as our example the smartphone industry, in which large companies such as Apple, Samsung, Google, LG, Motorola, Amazon, and Microsoft have all competed in recent years. While Apple and Samsung are market leaders at the time of writing, both have undergone expensive setbacks. Apple’s iPhone 5 was widely criticized due to issues with the Apple Maps application and Samsung had to recall its Galaxy Note 7 due to its batteries catching fire, costing an estimated 3 billion USD (swider2016), in what may have been an attempt to improve on the criticized battery life of their Galaxy S6. Similarly, Google’s original Nexus line of phones dropped in popularity to the point where the company went to the expense of creating a new line of Pixel phones rather than continuing the Nexus. Amazon and Microsoft were forced out of the market entirely, with Amazon’s Fire Phone lasting just over a year (July 2014 - August 2015) between release and the cessation of production, causing a loss of at least 170 million USD for Amazon’s 2014 Q3 alone (mccallion2014). Microsoft meanwhile acquired Nokia for 7.2 billion USD in an attempt to become more competitive in the market (warren2016), but ceased mobile device production entirely only a few years later.
Despite the cost of the setbacks mentioned above, each of these companies is still valuable with Apple and Microsoft having market caps of over 1 trillion USD at the time of writing and Amazon recently passing that milestone as well. Samsung is worth approximately 300 billion USD at the time of writing, and while they are smaller LG and Motorola are quite valuable as well, worth approximately 14.5 billion and 30 billion USD, respectively. Because of their size, each of these companies was able to take risks to compete with each other which, although expected to end in a positive outcome, resulted in expensive losses. Indeed, Microsoft currently appears to be preparing for another attempt to enter the smartphone market with the Surface Duo. In other words, these companies are still able to compete with each other by making products which maximize their expected values because they are large enough that they can afford to wait for the law of large numbers to take effect. This allows their competition to be modeled through a traditional game theoretic framework.
In contrast, consider a company with a smaller valuation, say 500 million USD, deciding whether to compete in the smartphone market. If such a company attempted to do so, it would have to commit most if not all of its resources to the attempt. Even if such a strategy has a large positive expected value, it has a large risk of bankrupting the company, as seen with the scale of the losses incurred by Samsung, Amazon, and Microsoft. More generally, firms in markets where the cost of competition is a significant portion of the value of the firm itself must consider more than just maximizing their expected value. A misstep in such a setting means that the firm is out of the market and unable to compete further. This highlights an important facet of competition with random or unknown variables; i.e., it is not just the expected value of a strategy that is important, it is how many times you get to compete.
In this paper, we build a new framework to apply this observation to game theoretic situations. We consider stochastic games drawn from known distributions in which players engage once or for a given finite number of times. Because of the finite number of times that players engage in these games, given the strategies of all other players, expected utility may not be a suitable metric for a player to attempt to maximize. Instead, we formulate a new definition of a risk-averse best response, where given the strategies of all other agents, an agent chooses to play the strategy that is most likely to have the highest utility in a single realization of the stochastic game. While the mathematical particulars of this definition will be discussed in Section 3, conceptually it can best be understood through the lens of prospect theory.
In its most basic form, prospect theory states that consumers prefer choices with lower volatility, even when this results in lower expected utility. An excellent example of this is retirement planning where there are many highly volatile assets which in expectation provide a large return on investment, but which also have a high chance of dropping in value due to their volatility. Most individuals try to avoid investing too much in these assets, receiving a lower average return in order to avoid the chance of a significant loss. Similarly, a risk-averse best response as we have loosely defined it so far would possibly limit the expected return of assets in order to maximize the probability of making the most profit.
The rest of this paper is organized as follows. Section 2 provides a thorough review of related work relevant to this topic, in particular a more detailed discussion of prospect theory. Section 3 provides the formal mathematical definition of our proposed risk-averse equilibrium, with several subsections detailing topics such as equilibrium properties, computation, and worked-out examples. Section 4 considers finite-time commit games and how the risk-averse equilibria shift as the number of times the games are played increases. Section 5 lays out a comparison between the classical Nash equilibrium and the proposed risk-averse equilibrium through simulation. Finally, Section 6 contains our concluding remarks as well as future directions in which to advance this research.
2 Related Work
Since the seminal work of vonNeumann1928 and later nash1950, expected utility has emerged as the dominant objective value within game theory as each player attempts to maximize his/her expected utility given the actions of other players. This concept was extended naturally into games of incomplete information (Bayesian games) by harsanyi1967, as players can still maximize their expected utility given a distribution from which the game will be drawn. These games have received a great deal of attention as they more accurately model real-world situations where not all parameters are known precisely, with later works such as wiseman2005 addressing how players sequentially refine their equilibria as they learn the distributions and the more recent mertikopoulos2019 addressing how players learn their payoffs with continuous action sets. Another recent work (sugaya2019
) considers the more specific question of how firms in a duopoly should play when the payoff distributions are based on the market state, a random variable with possibly unknown distribution.
Despite all the work that has gone into expected utility as the objective value players wish to maximize, it is still questionable whether this is a good assumption. goeree2003 present an empirical study of risk-aversion in the matching-pennies game, where they observe marked deviations from Nash behavior (expected utility maximization) as the payoffs/costs become larger. This is consistent with the concept of prospect theory based on empirical observations across several experiments in which the subjects deviate from actions which would maximize their expected utility. kahneman1979prospecttheory: formulated the idea of prospect theory, which states that consumers are naturally risk-averse when addressing situations with potential gains and naturally risk-seeking when facing situations with potential losses. Prospect theory has since been widely studied, with an extension of the original paper provided in tversky1992prospectcumulative to address more general payoff/cost functions. levy1992politic_prospect provides a good overview of classical prospect theory, particularly from a political perspective. Unsurprisingly, prospect theory has received a great deal of attention in financial studies (baele2018, barberis2016), with barberis2001 using it for asset pricing. Prospect theory is not without its critics; e.g., list2004 posits that the results of the studies on prospect theory are due to inexperienced consumers, and designs an experiment to show these behaviors disappear with experience. However, experienced consumers are by definition consumers who engage in similar trials multiple times, which means that for these consumers expected utility is an appropriate metric. As we are explicitly interested in games which will be played at most a small number of times, we do not need to be concerned with this effect.
3 Risk-Averse Equilibrium
The problem is formulated in the following subsection.
3.1 Problem Statement
Consider a game that consists of a finite set of players, , where player has a set of possible pure strategies (or actions, used interchangeably) denoted by . A pure strategy profile, which is one pure strategy for each player in the game, is denoted by , where is the pure strategy of player . Hence, is the set of pure strategy profiles. A pure strategy choice for all players except player is denoted by , i.e. . The payoff of player for a pure strategy profile is denoted by (or
), which is a random variable with probability density function (pdf)and mean . The payoffs for and are considered to be continuous-type random variables that are independent from each other. The same analysis holds for discrete-type random variables if the analysis is treated with a bit more subtlety as discussed in the end of this section.
For any set , let
be the set of all probability distributions over. The Cartesian product of all players’ mixed strategy sets, , is the set of mixed strategy profiles. Denote a specific mixed strategy of player by , where is the probability that player plays strategy . If the players choose to play a mixed strategy , the payoff for player if he/she plays is denoted by
. Using the law of total probability, the marginal distribution ofhas the probability distribution function
where and is the corresponding strategy of player in . Note that for , the random variables and are not independent of each other in a single play of the game.
3.2 Risk-Averse Equilibrium
In a stochastic game where the payoffs are random variables, playing the Nash equilibrium considering the expected payoffs may create a risky situation; e.g., see yekkehkhany2019risk and yekkehkhanycost
and the references therein for examples on multi-armed bandits. The reason is that payoffs with larger expectations may have a larger variance as well. As a result, it may be the case that playing strategies with lower expectations is more probable to have a larger payoff. This concept is mostly helpful when players play the game once, so they do not have the chance to repeat the game and gain a larger cumulative payoff by playing the strategy with the largest expected payoff. As a result, we propose the risk-averse equilibrium in a probabilistic sense rather than in an expectation sense as the Nash equilibrium. From an individual player’s point of view, the best response to a mixed strategy of the rest of players is defined as follows.
The set of mixed strategy risk-averse best responses of player to the mixed strategy profile is the set of all probability distributions over the set
where what we mean by being greater than or equal to when is that is greater than or equal to for all ; otherwise, if , player only has a single option that can be played. The same randomness on the action of players is considered in for all , and independent randomness on actions is analyzed in the Appendix. We denote the risk-averse best response set of player ’s strategies, given the other players’ mixed strategies , by , which is in general a set-valued function.
Given the definition of the risk-averse best response, the risk-averse equilibrium (RAE) is defined as follows.
A strategy profile is a risk-averse equilibrium (RAE), if and only if for all .
The following theorem proves the existence of a mixed strategy risk-averse equilibrium for a game with finite number of players and finite number of strategies per player.
For any finite -player game, a risk-averse equilibrium exists.
Proof:Consider the risk-averse best response function defined as . The existence of a risk-averse equilibrium is equivalent to the existence of a fixed point such that . Kakutani’s Fixed Point Theorem is used to prove the existence of a fixed point for . In order to use Kakutani’s theorem, the four conditions listed below should be satisfied, which are proven as follows.
is a nonempty subset of a finite dimensional Euclidean space, compact, and convex: is nonempty since it is the Cartesian product of nonempty simplices as each player has at least one feasible pure strategy. is bounded since each of its elements is between zero and one, and is closed since it is the Cartesian product of simplices, so contains all its limit points.
is nonempty for all : is the set of all probability distributions over the set specified in Equation (2), where the mentioned set is nonempty since maximum always exists for finite number of values.
is a convex set for all : It suffices to prove that is a convex set for all . Consider and . Define the support of and as and , respectively. From the definition of risk-averse best response in Definition 3.2, . As a result, , and again due to definition of risk-averse best response, any probability distribution over the set is also a best response to . The mixed strategy is obviously a valid probability distribution over the set , so that completes the proof for convexity of the set .
has a closed graph: has a closed graph if for any sequence with for all , we have . The fact that has a closed graph is proved by contradiction. Consider that does not have a closed graph. Then, there exists a sequence with for all , but . This means there exists some such that . As a result, due to the definition of risk-averse best response in Definition 3.2, there exists , , where can be any of the strategies in the set , and some such that
Given that payoffs are continuous random variables and, for a sufficiently large we have
Due to the same reasoning as for Equation (4), for a sufficiently large we have
However, Equation (7) contradicts the fact that .
The above four properties of the risk-averse best response function fulfil the conditions for Kakutani’s Fixed Point Theorem. This means that for a finite -player game, there always exists such that , where by definition is a mixed strategy risk-averse equilibrium. ∎
3.3 Pure Strategy Risk-Averse Equilibrium
The pure strategy risk-averse best response is defined in the following as a specific case of the risk-averse best response defined in Definition 3.2.
Pure strategy of player is a risk-averse best response (RB) to the pure strategy of the other players if
where what we mean by being greater than or equal to is that is greater than or equal to for all . We denote the risk-averse best response set of player , given the other players’ pure strategies , by (overloading notation, is used for both pure and mixed strategy risk-averse best response).
Given the definition of the pure strategy risk-averse best response, the pure strategy risk-averse equilibrium (RAE), which does not necessarily exist, is defined below.
A pure strategy profile is a pure strategy risk-averse equilibrium (RAE), if and only if for all .
3.4 Strict Dominance and Iterated Elimination of Strictly Dominated Strategies
Probably the most basic solution concept for a game is the dominant strategy equilibrium. In the following definition, the strict dominance is described.
A pure strategy of player strictly dominates a second pure strategy of the player if
A strictly dominated strategy cannot be the risk-averse best response to any mixed strategy profile of other players due to the following reason. Consider that is strictly dominated by for player as is stated in Definition 3.4. Then, for any , we have
where is followed by using the law of total probability, and is the corresponding strategy of player in , and is true by the assumption that the pure strategy is strictly dominated by the pure strategy and using Equation (9) in Definition 3.4 on strict dominance. By Equation (10) and Equation (2) in Definition 3.2 on the best response to a mixed strategy profile of other players, a strictly dominated pure strategy can never be a best response to any mixed strategy profile of other players. As a result, a strictly dominated pure strategy can be removed from the set of strategies of a player and iterated elimination of strictly dominated strategies can be applied to a game under the risk-averse framework.
3.5 Finding the Risk-Averse Equilibrium
The mixed strategy risk-averse equilibrium of a game can be found by choosing players’ mixed strategy profiles in such a way that a player cannot strategize against other players. In other words, under a mixed strategy risk-averse equilibrium, all players are indifferent to their mixed strategies, so they use a mixed strategy to make other players indifferent as well. If all players are indifferent to their mixed risk-averse strategies, then no player has an incentive to change strategies, so we end up with a mixed strategy risk-averse equilibrium. Formally speaking, a risk-averse mixed strategy is characterized by for all and for all , so there are parameters that should be found. Letting the mixed strategy profile for players be , then in order for player to be indifferent to his/her set of strategies among a subset , we need to have
The above equations reveal independent equations for each player , so in total equations are derived. The remaining equations are provided by the fact that the mixed strategy of each player adds to one for their set of strategies. As a result, if there is a mixed strategy risk-averse equilibrium for which only a subset of the pure strategies, denoted as the support of the equilibrium, are played with non-zero probability, this equilibrium is a solution of the following set of equations for :
Any solution to Equation set (11) is a risk-averse equilibrium, so we can check if an equilibrium exists for any support .
Note that as is stated in Equation (10), we have the following by using the law of total probability:
where and is the corresponding strategy of player in . Hence, Equation (12) is polynomial of order in terms of for and . We can define a risk-averse probability tensor of dimension , where the -th dimension has all pure strategies
and each element of the tensor is an
dimensional vector defined in the following. The-th element of the dimensional vector corresponding to the pure strategy profile is defined as
As a result, an equivalent approach for finding the risk-averse equilibrium is to find the Nash equilibrium of the risk-averse probability tensor, as any such Nash equilibrium must maximize the probability of playing a utility-maximizing response to for each player . In the following two subsections, two illustrative examples are provided to make the concept of the risk-averse equilibrium clear.
3.6 Illustrative Example 1
The following example is presented to shed light on the notion of pure strategy risk-averse equilibrium. Consider a game between two players where each player has two pure strategies, and , with independent payoff distributions specified as
and are independent and have the same pdf as ,
and are independent and have the same pdf as ,
and are independent and have the same pdf as ,
and are independent and have the same pdf as ,
where , and are constants for which each of the corresponding distributions integrate to one and is the indicator function.
The above example is depicted in Figure 1. Considering the expected payoffs in Example 3.6, , , and , the pure Nash equilibria of the game are and , and the mixed Nash equilibrium is that the first player selects with probability two-thirds and selects otherwise and the second player selects with probability two-thirds and selects otherwise. On the other hand, by using the payoff density functions we have and , which are used to form the risk-averse probability bi-matrix of the game derived based on Equation (13). The risk-averse probability matrix is depicted in Figure 1. According to Definition 3.3, is a pure strategy risk-averse equilibrium that is different from the Nash equilibria of the game. Taking a close look at the payoff distributions, is less risky than and in a single round of the game.
3.7 Illustrative Example 2
In this subsection, the mixed strategy risk-averse equilibrium of a two-player game proposed in the following example is computed. Consider a game between two players where each player has two pure strategies, and , with independent payoff distributions specified as
and are independent and have the same pdf as ,
, and are independent and have the same pdf as ,
and are independent and have the same pdf as ,
where and are constants for which each of the corresponding distributions integrate to one.
The above example is depicted in Figure 2. Considering the expected payoffs in Example 3.7, , , and , the pure Nash equilibrium of the game is as depicted in Figure 2 with no mixed strategy Nash equilibria. On the other hand, by using the payoff density functions we have and , which are used to form the risk-averse probability bi-matrix of the game derived based on Equation (13). The risk-averse probability matrix is depicted in Figure 2. According to Definition 3.3, and are the pure strategy risk-averse equilibria. In order to find the mixed strategy risk-averse equilibrium, consider that the first player selects with probability and selects otherwise. Given the first player’s mixed strategy , with a little misuse of notation, denote the random variables denoting the second player’s payoffs by selecting or with and , respectively. The second player is indifferent between selecting and if . Since payoffs are continuous random variables, ; as a result, the second player is indifferent between the strategies if . By using the law of total probability and independence of payoff distributions, can be computed as
Letting , then , which determines the mixed strategy risk-averse equilibrium. As a result, due to symmetry, and form the mixed strategy risk-averse equilibrium of the game in Example 3.7.
It is easy to verify that the game proposed in Example 3.6 does not have any mixed strategy risk-averse equilibria. The game in Example 3.6 has both pure and mixed strategy Nash equilibria, but it only has pure strategy risk-averse equilibrium. On the other hand, the game in Example 3.7 only has pure strategy Nash equilibrium, but it has both pure and mixed risk-averse equilibria. As can be seen, the distributions of payoffs can have a significant impact on the behavior of players if they take risk into account when taking their decisions.
As mentioned earlier in this section, the analysis for risk-averse equilibrium holds for discrete-time random variables as well. For example, consider random variables , and with distributions
Denote the observations of the three random variables by triple and let be the event that is greater than or equal to both and . Then
As can be seen, . In order to resolve this issue, we can break ties uniformly at random as
which results in .
4 Finite-Time Commit Games
The risk-averse framework discussed in Section 3 provides risk-averse players with pure or mixed strategies such that given the other players’ strategies, risk-averse equilibrium maximizes the probability that a player is rewarded the most in a single round of the game rather than maximizing the expected received reward. On the other hand, for infinite rounds of playing the game, given the other players’ strategies, selecting the strategy that maximizes the expected reward guarantees maximum cumulative reward. However, the rewards may not be satisfying for a risk-averse player in each and every round of playing the game. As a result, risk-averse players may even choose to play the risk-averse equilibrium in infinite (or finite) rounds of games to have more or less balanced rewards in all rounds of the game rather than have maximum cumulative reward in the end. Despite this fact, we present a slightly different approach for finite-time games that aims to maximize not the expected cumulative reward but rather the probability of receiving the highest cumulative reward. Note that the proposed equilibrium for finite-time commit games in this section may be different from the Nash equilibrium or the equilibrium presented in Section 3.
Consider that the players plan to play a game for independent times where all players have to commit to the pure strategy they play in the first round for the whole game. The strategy in the first round of the game does not have to be pure and can be mixed, but a player has to commit to the randomly sampled pure strategy according to the mixed strategy for times. Let , where for are independent and identically distributed random variables and . If players choose to play for the whole game with rounds, the random variable denotes the cumulative payoff for player in the end of the plays and .
If the players choose to play a mixed strategy in the first round of the game and commit to it for other rounds of the game, using the law of total probability, the distribution of the cumulative payoff for player in the end of the game when he/she plays , denoted by , has the probability distribution function
where and is the corresponding strategy of player in . Note that for , the random variables and are not independent from each other in a single play of the game. The risk-averse equilibrium for an -time commit game can be derived similarly to the derivations in Section 3 and is described below. From an individual player’s point of view, the best response to a mixed strategy of the rest of the players for an -time commit game is defined as follows.
The set of mixed strategy risk-averse best responses of player to the mixed strategy profile for an -time commit game is the set of all probability distributions over the set
where what we mean by being greater than or equal to is that is greater than or equal to for all ; otherwise, if , player only has a single option that has to play. We denote the risk-averse best response set of player ’s mixed strategies for an -time commit game, given the other players’ mixed strategies , by , which is a set-valued function.
Given the definition of the risk-averse best response for -time commit games, the risk-averse equilibrium (RAE) for -time commit games is defined as follows.
A strategy profile is a risk-averse equilibrium (RAE) for an -time commit game, if and only if for all .
The following corollary is resulted directly from Theorem 3.2.
For any finite -player finite-time commit game, a risk-averse equilibrium exists.
The pure strategy risk-averse best response for an -time commit game is defined in the following as a specific case of the risk-averse best response defined in Definition 4.
Pure strategy of player is a risk-averse best response (RB) to the pure strategy of the other players for an -time commit game if
where what we mean by being greater than or equal to is that is greater than or equal to for all . We denote the risk-averse best response set of player for an -time commit game, given the other players’ pure strategies , by (overloading notation, is used for both mixed and pure strategy risk-averse best response for -time commit games).
Given the definition of the pure strategy risk-averse best response for an -time commit game, the pure strategy risk-averse equilibrium (RAE), which does not necessarily exist, is defined below.
A pure strategy profile is a pure strategy risk-averse equilibrium (RAE) for an -time commit game, if and only if for all .
5 Numerical Results
In this section, the classical Nash equilibrium is compared with the proposed risk-averse equilibrium. To this end, the likelihood of receiving the higher reward in a two-player game is evaluated under the two types of equilibria for the following example.
Consider a game between two players where each player has two pure strategies,