Game theory is the study of interaction between rational agents [leytonbrown2008essentials]. The field has influenced entire areas of research, with a notable example being economics [samuelson2016game], and shaped applications such as complex energy systems [he2020application]. One of the most studied settings in game theory is the (single-objective) Normal-Form Game (NFGs). In this context, Nash [nash1951non] showed already in 1951 that optimal outcomes, in the sense of Nash equilibria, must exist for agents in these settings. An important limitation of the traditional formulation of such games is their restriction to scalar payoffs. Throughout the years, many authors have stressed the need to extend this theory to games with vectorial payoffs, rather than only considering the scalar case [blackwell1954analog, shapley1959equilibrium, corley1985games, wierzbicki1995multiple]. The motivation for this comes from the fact that many real-world decision making scenarios inherently contain multiple (conflicting) criteria. As an illustration, consider the case of a commuter deciding between taking their car or bike to work with the objectives of maximising speed while also minimising fuel consumption. Such scenarios subsequently return multiple payoffs. In multi-objective games, these payoffs can be modeled as a vector where each entry in the vector corresponds to the payoff for a specific objective.
We study the problem of Nash equilibria in games with vectorial payoffs, i.e., Multi-Objective Normal-Form Games (MONFGs) [blackwell1954analog]. To practically deal with vectorial rewards, we take a utility-based approach assuming that each agent has a utility function that can be used to scalarise payoff vectors [roijers2013survey, roijers2017multi]. One problem that arises with this approach is that it is not immediately clear at which state of the process to apply the utility function [roijers2013survey]. In NFGs we deal with payoffs of mixed strategies by calculating the expected payoff of such a strategy. In the case of vectorial payoffs, it becomes considerably more complex as we have two different approaches, also referred to as optimisation criteria. On the one hand, we could scalarise the different possible outcomes and calculate the expectation of these utilities. This approach is also called the the Expected Scalarised Returns (ESR) criterion and is equivalent to scalarising the multi-objective game with the provided utility functions, turning it into a single-objective normal-form game. We will sometimes refer to this resulting single-objective game as the trade-off game, consistent with terminology used in other works [radulescu2020utility]. Conversely, we might first take the expectation of the mixed strategy and subsequently scalarise this expected payoff vector, also referred to as the Scalarised Expected Returns (SER) criterion [radulescu2020multi]. Recent work showed that when players have non-linear utility functions, these two criteria are not equivalent and that Nash equilibria need not exist under SER [radulescu2020utility]. Because of the inequality between the two optimisation criteria and the fact that existence of Nash equilibria is not guaranteed for the latter, it is in general not appropriate to simply scalarise such vectorial payoffs a priori to single objective NFGs which would allow to apply traditional game theoretical concepts and techniques. We note that in principle, we could also employ an utility function agnostic approach where no knowledge of the utility function is assumed. However, this implies that we consider expected payoff vectors in the case of mixed strategies, leading us to the SER criterion where the final scalarisation is simply unknown.
We focus on five distinct aspect of MONFGs. First, we aim to provide guarantees for the existence of a Nash equilibrium under SER by imposing sufficient restrictions on the type of utility functions that can be used. Our second goal is to study the relationship between the two optimisation criteria, SER and ESR, when both have Nash equilibria. This culminates in the conclusion that under non-linear utility functions, the number of Nash equilibria under both criteria need not be equal and no equilibria must be shared. Our third contribution restrains the Nash equilibria in question to pure strategy Nash equilibria, to find equivalences between an MONFG with known utility functions and its trade-off game. Next, we subsequently extend these results to games where some agents are optimising for SER and others for ESR. We refer to this as a blended setting and formally define Nash equilibria in such games. Lastly, we utilise the theoretical results from our study in an algorithm that is able to calculate the pure strategy Nash equilibria in a given MONFG with quasiconvex utility functions. Concretely, we contribute the following:
We prove the existence of a Nash equilibrium in MONFGs under the SER criterion when all agents have continuous quasiconcave utility functions.
We show that assuming only strictly convex utility functions is not a sufficient guarantee for Nash equilibria to exist under SER.
We show that even when Nash equilibria exist under both criteria, i.e. ESR and SER, the number of Nash equilibria need not be equal and that no equilibria must be shared.
We prove that pure strategy Nash equilibria under SER must also be Nash equilibria under ESR, regardless of the utility functions the agents have.
We prove that if all agents have quasiconvex utility functions, the pure strategy Nash equilibria are equivalent under SER and ESR.
We define Nash equilibria in blended settings where some agents are optimising for the SER criterion while others for ESR. We subsequently show that in such settings, pure strategy Nash equilibria can be retrieved from the trade-off game when only quasiconvex utility functions are considered.
We construct an algorithm that calculates a subset or all of the pure strategy Nash equilibria in a given MONFG with quasiconvex utility functions.
In this section, we introduce the necessary game-theoretical and mathematical background to ensure that this work is self-contained. We further provide an overview of the utility-based approach which we follow to arrive at our novel theorems.
2.1 Normal-Form Games
Normal-Form Games (NFGs) present a concise approach for reasoning about stateless -player interactions. In such games, the payoff for every player depends on the joint action that is selected. Formally [leytonbrown2008essentials]:
Definition 1 (Normal-Form Game).
A (finite, n-player) normal-form game is a tuple , where:
is a finite set of players, indexed by ;
, where is a finite set of actions available to player . Each vector is called an action profile;
where is a real-valued payoff function for player , given an action profile
We can represent a 2-player NFG as a matrix where the payoff of each joint action is shown in the associated cell. Table 1 shows a well-known example of such a matrix game, namely the prisoner’s dilemma. As an illustration for the payoff mechanism, if the row player in this game opts to defect while the column player chooses to cooperate, the row player would receive a payoff of while the column player receives a payoff of .
In general, players are not restricted to only playing a single action, also known as a pure strategy
. Rather, they can introduce randomness into their strategy by playing a mix of actions according to some probability distribution, known as amixed strategy. We formally define this as follows:
Definition 2 (Mixed strategy).
Let be a normal-form game, and for any set let
be the set of all probability distributions over. Then the set of mixed strategies for player is .
Note that a pure strategy is a special case of a mixed strategy where one action is selected with probability 1. When dealing with mixed strategies, the associated payoffs are not directly clear from the payoff functions. To remedy this, we utilize the concept of expected payoff. Intuitively, this expected payoff is the average payoff that will be obtained when playing the mixed strategy a large number of times.
Definition 3 (Expected payoff of a mixed strategy).
Given a normal-form game , the expected payoff for player of the mixed strategy profile is defined as
2.2 Nash Equilibria
We assume players to be rational decision makers, implying that they are aiming to optimise their expected payoff. Because a player’s expected payoff depends on the strategy of other players as well, we can not optimise a strategy in isolation. To overcome this, we define interesting groups of outcomes called solution concepts. A fundamental solution concept that we use in this work is the Nash equilibrium (NE) [nash1951non]. Intuitively, one can understand a Nash equilibrium as the joint strategy from which no player can unilaterally deviate while still improving their expected payoff.
To capture the incentive for deviating, we must first define the concept of best responses. A best response is a mixed strategy that maximises a player’s expected payoff given all other players’ strategies. Note that such a best response need not be unique. For the purpose of notation, we define as the strategy profile without the strategy of player , so that we may write . Formally:
Definition 4 (Best Response).
Player ’s best response to the strategy profile is a mixed strategy such that for all strategies .
Recall that in a Nash equilibrium, no rational player wishes to deviate from the joint strategy. We can thus define a Nash equilibrium as the strategy profile in which each strategy is a best response to all other strategies:
Definition 5 (Nash Equilibrium).
A strategy profile is a Nash equilibrium if, for all agents , is a best response to .
Note that if each in the NE is a pure strategy, the resulting NE is called a pure strategy Nash equilibrium.
We show the prisoner’s dilemma example from Section 2.1 in Table 2 where the highlighted cell (Defect, Defect) represents the equilibrium. Observe that no player can improve their payoff by deviating, as playing another action can only reduce their expected payoff.
We highlight the fact that in general, Nash equilibria can be mixed strategies. An intuitive example of such a mixed strategy equilibrium occurs in the game of rock-paper-scissors [roughgarden2016introduction]
. Every strategy that is not a uniform distribution over the three options can be exploited by the opponent, leading to a single Nash equilibrium where agents play each action with probability.
2.3 Multi-Objective Normal-Form Games
Multi-Objective Normal-Form Games (MONFGs) can intuitively be understood as the generalisation of (single-objective) NFGs to vectorial payoffs. We define this below [radulescu2020utility]:
Definition 6 (Multi-objective normal-form game).
A (finite, n-player) multi-objective normal-form game is a tuple , with objectives, where:
is a finite set of players, indexed by ;
, where is a finite set of actions available to player . Each vector is called an action profile;
where is the vectorial payoff function for player , given an action profile.
We denote player ’s payoff function in bold to emphasize the fact that we are dealing with vectors rather than scalars. We can represent 2-player MONFGs analogous to the way we represented 2-player NFGs as a matrix and show an example in Table 3. To illustrate, if the row player opts for action B while the column player chooses A, the row player receives a payoff of and the column player .
|A||(1, 1); (0, 0)||(0, 1); (1, 0)|
|B||(1, 0); (0, 1)||(0, 0); (1, 1)|
The study of MONFGs, and specifically using a utility-based approach, has received much less attention than single-objective NFGs. In addition, work on MONFGs has been fragmented and different assumptions about the setting in which this model is used can lead to vastly different outcomes.
2.4 Utility-Based Approach
To deal with the vectorial payoffs in our setting, we adopt a utility-based approach [roijers2013survey]. This approach assumes for each agent the existence of a utility function that can scalarise vectors to their scalar utility. An example of a utility function that is frequently used is the linear utility function which assigns a weight to each objective in the payoff function and subsequently calculates the weighted sum over all objectives.
with the payoff for objective under strategy . In general, utility functions can be non-linear and are only assumed to be monotonically increasing. This intuitively means that we should always favour more over less of an objective, when all other objectives remain the same. Formally:
The utility-based approach has the advantage of scalar utilities that can be compared and ranked, which leads to a partial ordering. The drawback of this approach is that the utility function can be applied in two distinct ways. As an illustration, consider the problem of a commuter going to their work [radulescu2020multi]. This commuter cares about two objectives, namely minimising the travel time () while optimising the level of comfort (). On the one hand, it is possible that they want to optimise the utility they might derive from each individual trip. This leads to what is known as the Expected Scalarised Returns (ESR) criterion where we apply the utility function before taking the expectation [roijers2018multi, radulescu2020utility, hayes2021risk].
In Equation 3, is the scalar utility for player with utility function while following the joint strategy . If the commuter is making this trip daily, it is also possible that they will aim to optimise the utility they can derive from the average payoff of multiple trips. This leads to the Scalarised Expected Returns (SER) where we perform the scalarisation after the expectation.
We stress that Equation 3 is nearly identical to Definition 2 of expected payoffs of a mixed strategy in single-objective games. The critical difference is that the utility function is applied to each payoff vector . This essentially reduces the multi-objective game to a single-objective game, where the payoff functions in the single-objective game are defined as the composition between the utility function of each player and their vectorial payoff function [radulescu2020utility]. In this work, we often refer to the resulting single-objective game as the trade-off game. This denotes the fact that the scalar payoff for each joint-action is a trade-off between the original objectives and is consistent with terminology used in other works [radulescu2020utility].
Under non-linear utility functions, the two criteria can provide different results [roijers2013survey]. As a practical example, consider again the commuter. It is possible that this person is looking for the most balanced reward vector (i.e. they want to be equally comfortable and fast) by using a utility function . Further, assume they obtain a reward of on day one and on day two. If we optimise for the ESR criterion, this would result in a utility of . On the other hand, if we optimise for the utility we can derive from several executions of the same strategy, this would lead first to the expected reward vector resulting in a utility of .
These results show that careful consideration is required when selecting either one of these optimisation criteria. For several motivating examples prescribing a specific optimisation criterion, we refer to [radulescu2021decision]. Furthermore, the work by [radulescu2020utility] proves an important result that an MONFG under the ESR objective can always be reduced to a single-objective NFG a priori, implying that game-theoretic results on games with scalar payoffs can be used to solve such games.
2.5 Nash Equilibria in MONFGs
The introduction of utility functions in Section 2.4 allows us to frame Nash equilibria in MONFGs in terms of their utility. Below we define NE for each of the optimisation criteria. First, under the ESR criterion [radulescu2020utility]:
Definition 7 (Nash equilibrium for expected scalarised returns).
A joint strategy is a Nash equilibrium in an MONFG under the expected scalarised returns criterion if for all players and all alternative strategies :
i.e. is a Nash equilibrium under ESR if no player can increase the expected utility of its payoffs by deviating unilaterally from .
And analogous for the SER criterion [radulescu2020utility]:
Definition 8 (Nash equilibrium for scalarised expected returns).
A joint strategy is a Nash equilibrium in an MONFG under the scalarised expected returns criterion if for all players and all alternative strategies :
i.e. is a Nash equilibrium under SER if no player can increase the utility of its expected payoffs by deviating unilaterally from .
While it has been shown that in finite NFGs a mixed Nash equilibrium always exists [nash1951non], in an MONFG when optimising for the SER criterion and with non-linear utility function this guarantee does not hold [radulescu2020utility]. We refer to [radulescu2020utility] for an example and formal proof that NE may not exist in MONFGs under SER. An open question that remains is then what restrictions on payoff structure or utility functions do guarantee the existence of NE in this setting.
2.6 Convexity and Quasiconvexity
In this work we consider different classes of functions, namely convex and quasiconvex functions as well as concave and quasiconcave functions, due to their useful properties. We offer below the formal definition of a convex and concave function. Note that each definition can be modified to denote strict (quasi)convexity, respectively (quasi)concavity, by replacing with , respectively with , and considering .
A function is convex if its domain is a convex set and for all in its domain, and all , we have:
A function is concave if its domain is a convex set and for all in its domain, and all , we have:
Intuitively, convex functions can be considered functions for which the line segment between any two points lies above the graph. Conversely, the line segment between any two points on a concave function will always lie below the graph. To visually illustrate these definitions, we show an example in Figure 1.
Because the definition of such functions is relatively strict, a broader class of functions can be constructed in the form of quasiconvex and quasiconcave functions.
A function is quasiconvex if its domain is a convex set and for all in its domain, and all , we have:
Observe that the definition of convex functions imply that they are also quasiconvex. However, quasiconvexity does not necessarily imply convexity. We define a quasiconcave function analogously as follows:
A function is quasiconcave if its domain is a convex set and for all in its domain, and all , we have:
Here too, concavity implies quasiconcavity while the other way around does not hold. As such, these classes of functions can be seen as less restrictive. We show an example of a convex function, a quasiconvex function and an arbitrary function which possesses neither property in Figure 2.
It is important to note that any function that is monotonically increasing is also both quasiconvex and quasiconcave. However, this property does not generalise to multivariate functions, i.e. .
3 Existence Guarantees for Nash Equilibria in MONFGs
In this section, we provide a sufficient guarantees for Nash equilibria to exist in MONFGs when optimising for the SER criterion. While it has previously been shown that in general, no NE need exist in this setting [radulescu2020utility], we accomplish our contributions by restricting the agents in our games to all use a continuous quasiconcave utility function. Furthermore, we find that restricting utility functions to be continuous quasiconvex is not a sufficient guarantee which we demonstrate through a counterexample. However, the lemmas we introduce to construct this counterexample may be used in the future to disprove the existence of a Nash equilibrium in a given game.
We first show that if all agents are using a continuous quasiconcave utility function, a mixed strategy Nash equilibrium is guaranteed. We prove this by reducing an MONFG under SER to a single-objective infinite NFG, which extends NFGs from Definition 1 to games with infinitely many pure strategies. In such settings, it is known that if all strategy spaces are non-empty, compact convex and payoff functions are continuous and quasiconcave, a Pure Strategy Nash Equilibrium (PSNE) must exist [fudenberg1991game]. This theorem is based on earlier work by [debreu1952social, glicksberg1952further, fan1952fixed] and uses the Kakutani fixed-point theorem [kakutani1941generalization] to show that there must exist a fixed-point which is a PSNE in the infinite game. By carefully constructing this NFG, we can show that a pure strategy NE in the infinite single-objective NFG corresponds to a mixed-strategy NE in our MONFG under SER. We formalise this result in Theorem 1.
Consider a (finite, n-player) multi-objective normal-form game where players are optimising for the scalarised expected returns criterion. If each player has a continuous quasiconcave utility function, a mixed strategy Nash equilibrium is guaranteed to exist.
Given a finite multi-objective normal-form game
and for each player a continuous quasiconcave utility function .
We construct the following infinite normal-form game, i.e. an NFG with an infinite pure strategy space:
with the same players and for each player an infinite pure strategy set and payoff function . We define player ’s infinite pure strategy set in as the set of mixed strategies over their action set in :
Recall from Definition 2 that denotes the set of probability distributions over the action set . Next, we define each player’s payoff function in to be equivalent to the scalarised expected returns in :
Following [fudenberg1991game], we can guarantee a pure strategy Nash equilibrium in if:
Each is a non-empty, compact and convex subset of Euclidean space
All are continuous in
All are quasiconcave in
First, we defined each in to be the set of mixed strategies over a finite set of actions in . Therefore, each is a simplex and by definition a compact and convex subset of Euclidean space. Second, we assume each to be continuous, making also continuous in . Third, we can show that because is quasiconcave in the payoff, is also quasiconcave in . Because is quasiconcave for all vectors in its domain, and all , we have
Let and ; then
By the linearity of expectation we have the following
so we may now state that
which is the requirement for quasiconcavity of in . As we satisfy the three conditions, we know that must have a pure strategy Nash equilibrium. We now show that this pure strategy Nash equilibrium in corresponds to a Nash equilibrium in .
Observe that a pure strategy Nash equilibrium in implies that
for all players and alternative pure strategies . Given our definition of , we can substitute and arrive at the following equation:
for all players and alternative strategies . Because each strategy is defined as a mixed strategy in , we obtain precisely the definition of a Nash equilibrium in an MONFG under SER. ∎∎
It is worth briefly discussing what the proposed restriction intuitively means. It is known from utility theory that having a quasiconcave utility function leads to having convex preferences over the objectives. This can be understood as a player that favors an average return over their objectives more than an extreme return on just one objective. This appears a sensible restriction, as quasiconcave utility functions are also often considered in an economical context and are known as generating well-behaved preferences [varian2014intermediate].
We now show that restricting players to use only continuous quasiconvex utility functions does not guarantee a Nash equilibrium to exist. To prove this result, we first introduce three necessary lemmas. Lemma 2 demonstrates that when employing only quasiconvex utility functions, a pure strategy is always a best-response. Next, Lemma 3 is a purely mathematical statement that is later used in Lemma 4 but for which we were not able to find a reference. We present it here as it may be of independent value to others. Lastly, Lemma 4 combines the previous statements to show that whenever all utility functions are strictly quasiconvex, a mixed-strategy can only be a best-response if the expected payoff vector is equal for all actions that are played with probability greater than zero. We highlight that Lemmas 2 and 4 in particular can be straightforwardly applied to any given MONFG with strictly quasiconvex utility functions to show whether a Nash equilibrium exists.
Lemma 2 (Optimality of a pure strategy as a best response).
Consider a (finite, n-player) multi-objective normal-form game where players are optimising for the scalarised expected returns criterion. If each player has a quasiconvex utility function , there must always exist an action that is a best response to the strategy profile of all other players such that for all alternative strategies :
A mixed strategy assigns a probability to each , such that
The expected reward vector for is further defined as:
Note that the expected reward vector is thus a convex combination of the payoff vector for each action.
We claim that when assuming only quasiconvex utility functions, a pure strategy exists which is a best response. In other words, when applying a quasiconvex function to a convex combination of vectors, this is bounded by the maximum of applied to a single vector. Formally:
In fact, this is a known property of quasiconvex functions [dragomir2012jensen], which proves Lemma 2.
Lemma 3 (Jensen’s inequality for strictly quasiconvex functions).
Let be a strictly quasiconvex function and the points in their domain not all equal. Then
We prove this by induction on the number of points. Note that it is a straightforward extension of Jensen’s inequality for quasiconvex functions [dragomir2012jensen].
For n = 2, it follows directly from the definition of a strictly quasiconvex function. Let us assume that the inequality holds when are distinct and prove the inequality for all distinct. Let , and
We can discern two possibilities. First, when we have from the induction hypothesis:
Alternatively, when . Then, by the definition of strict quasiconvexity and the induction hypothesis:
Observe that we have shown Lemma 3 holds when all points are distinct. When some ’s are equal, we can group the expression into where all ’s are distinct and . In this case, the result holds due to the induction hypothesis.
Lemma 4 (Mixed strategies as a best response).
Consider a (finite, n-player) multi-objective normal-form game where players are optimising for the scalarised expected returns criterion. If each player has a strictly quasiconvex utility function , a mixed strategy can only be a best-response when the actions that are played with probability greater than zero have equal expected returns to the strategy profile of all other players .
From Lemma 2, we know that a pure strategy will always be a best-response when employing a quasiconvex utility functions. Lemma 3 implies that for strictly quasiconvex utility functions, a mixed strategy will always return a lower utility than this optimal pure strategy when not all expected payoffs are equal for actions that are played with probability greater than zero. Therefore, the only case where a mixed strategy may be optimal is when such a mixed strategy combines only actions that have equal expected payoffs.
We can now contribute Theorem 5 which states that Nash equilibria may fail to exist when restricting players to use only strictly convex utility functions. Note that this proves immediately that no Nash equilibrium can be guaranteed when using only convex or quasiconvex utility functions, as these, by definition, contain the set of strictly convex functions.
Consider a (finite, n-player) multi-objective normal-form game where players are optimising for the scalarised expected returns criterion. If each player has a strictly convex utility function, a mixed strategy Nash equilibrium is not guaranteed to exist.
Consider the game in Table 4 and the following utility function which is used by both players:
This utility function is strictly convex over , and therefore also (continuous) strictly quasiconvex over . From Lemma 4 we know that with such utility functions, a mixed strategy can only be a best-response when all actions that are played with a probability greater than zero return the same payoff vector. Because there are only two actions, any mixed strategy would have equal expected return from both actions in response to the opponent’s strategy. Observe that this is never possible for either player due to the imbalanced return vector. As such, we may restrict the set of possible Nash equilibria to pure strategy Nash equilibria. It is trivial to verify that no pure-strategy Nash equilibrium exists. Therefore, no Nash equilibrium exists.
In (single-objective) NFGs, there exist efficient methods for retrieving Nash equilibria for any given game [lemke1964equilibrium, porter2008simple]. By providing sufficient guarantees for NE existence in MONFGs, and importantly what restrictions do not
suffice, we open up the possibility for computational methods in this setting as well. Furthermore, most algorithms in multi-agent reinforcement learning (MARL) make the agents converge to a Nash equilibrium[busoniu2008comprehensive, nowe2012game]. Theorem 1 guarantees the existence of Nash equilibria given certain restrictions, thus opening the possibility for new learning algorithms to be developed that also aim to converge on them. We discuss these ideas as possible directions for future work in Section 9.
4 Equilibrium Relations Between Optimisation Criteria
In this section, we shift our focus from the existence of Nash equilibria to the relations between Nash equilibria under both optimisation criteria. Specifically, we explore whether the number of Nash equilibria in an MONFG under SER equals that of the trade-off game and if any Nash equilibria need necessarily be shared. We find in both cases that even when a Nash equilibrium exists the results are negative and that there is no guaranteed relation between the two in general.
We first contribute a theorem stating the fact that in an MONFG, the number of NE under SER can be different from the number of NE under ESR, even when both have NE.
Consider a (finite, n-player) multi-objective normal-form game with at least one Nash equilibrium under both optimisation criteria. The size of the sets of Nash equilibria under the scalarised expected returns criterion and under the expected scalarised returns criterion need not be equal.
We can prove this theorem by constructing an MONFG that has this property. The MONFG we use for this purpose can be seen in Table 5. Below this MONFG, we show the single-objective NFG under ESR resulting from directly applying the utility function in Equation 5 to the payoffs, assuming that both agents use this same utility function, i.e. .
To begin, let us show the NE in the MONFG under ESR. We do this by first applying the utility functions for each agent – which in this case happens to be the same – directly to the payoff vectors in the MONFG. The resulting single-objective NFG can be seen in Table 4(b). We then observe that only the pure strategy profile (A, A) results in utilities above for both agents. As such, there is no incentive for agents to play a mixed strategy when the other agent plays A at least part of the time, leading to the pure strategy NE of (A, A). Additionally, (B, B) is not a NE, as there is an incentive for either agent to play A, which increases their utility. This then again leads both agents to adapt their strategies to the NE of (A, A), making it the only NE of the MONFG under ESR.
Next, we discuss the NE for the MONFG under SER (4(a)). First note that the pure strategy NE of (A, A) under ESR is not a NE under SER. To see this, observe that when one agent plays A deterministically, the best response for the other agent is to play a mixed strategy with probability for action A and probability for action B. This results in an expected return of and a utility of for both agents. In fact, this constitutes a NE under SER for this game, as no agent has an incentive to deviate from this strategy. A second NE occurs when the agents switch strategies, resulting in the same payoffs. Please note that both agents receive the same expected payoff vectors, and apply the same utility function to these. We can also show that the pure strategy (B, B) is not a NE, as this can be improved upon by either agent deterministically playing A. As such, the MONFG in Table 5 has at least two mixed strategy NE under SER and no pure strategy NE.
In this MONFG, both the game under SER and ESR have NE. However, we can see that they have a different number of NE, proving Theorem 6. ∎
We can shift our focus from comparing the size of the sets of Nash equilibria, to the equilibria themselves. Specifically, it is interesting to see whether a Nash equilibrium must necessarily be shared by both optimisation criteria. Here too we are able to show that no such relation exists in general. We formalise this in Theorem 7.
Consider a (finite, n-player) multi-objective normal-form game with at least one Nash equilibrium under both optimisation criteria. The set of Nash equilibria under the scalarised expected returns criterion and the set of Nash equilibria under the expected scalarised returns criterion may be disjoint.
We can show this using the same construction as provided for Theorem 6. In this construction, we already highlighted the only Nash equilibrium for the MONFG under ESR, namely (A, A). Moreover, we observe that this joint strategy is not a Nash equilibrium under SER as there is an incentive for either agent to deviate to playing the mixed strategy . As the MONFG has no other NE under ESR, no Nash equilibrium is shared in this construction.
5 Pure Strategy Nash Equilibria in MONFGs
In the previous section, Theorem 7 stated that Nash equilibria need not be shared between optimisation criteria. A natural follow up question is under what circumstances this does occur. We expand on this issue and first show that a Pure Strategy Nash Equilibrium (PSNE) under SER must always be a PSNE under ESR as well. Furthermore, we demonstrate that the inverse does not hold by providing a counter example. However, we prove that adding the assumption that all utility functions in the MONFG are quasiconvex does ensure that PSNE under ESR are also a PSNE under SER. Finally, as a direct result of the new theorems, we can show that the set of PSNE under ESR and SER are the same when assuming only quasiconvex utility functions.
In order to show that a pure strategy NE under SER must necessarily be a pure strategy NE under ESR, we introduce Lemma 8. This lemma states that the utility of a pure strategy profile under SER is the same as the utility of that pure strategy profile under ESR.
Lemma 8 (Utility of a pure strategy).
Given a pure strategy profile in a (finite, n-player) multi-objective normal-form game, the scalarised expected returns will always equal the expected scalarised returns.
Consider a pure strategy profile . We know that the observed payoff vector for this strategy will always be the same, even for multiple executions of the same strategy, as there is no randomisation over the actions. Because the expectation of a constant is equal to that constant we may now state:
and given a utility function , the expected utility also equals the received utility by the same reasoning:
We can thus say that for a pure strategy profile, the utility of a payoff under SER equals the utility under ESR:
We note the importance of deterministic payoffs in MONFGs for Lemma 8. If stochastic payoffs were allowed, the lemma, and by extension all other results in this section, would not hold. Given this lemma, we now define the first theorem of this section which states that a PSNE under SER, must always be a PSNE under ESR as well. In other words, the pure strategy Nash equilibria found in an MONFG under SER, must also be Nash equilibria in the trade-off game.
Consider a (finite, n-player) multi-objective normal-form game with a pure strategy Nash equilibrium under the scalarised expected returns criterion. This joint pure strategy must necessarily also be a Nash equilibrium under the expected scalarised returns criterion.
Given a pure strategy Nash equilibrium under SER , we can say that:
|A pure strategy Nash equilibrium under ESR|
The proof starts with the general definition of a pure strategy Nash equilibrium under SER and removes the expected values where possible in line two. In line three, we remark that if the pure strategy profile is an NE, it must necessarily also be equal to or better than unilaterally playing another pure strategy. In line four and five, this leads us to state that the utility of the pure strategy NE is greater or equal to the optimal convex combination of the utilities of the other pure strategies. This is because such a combination of scalars is bounded by the maximum of these scalars. In line six, we can freely introduce the expected value again in the left hand side of the inequality and rewrite the right hand side such that it now reflects the expected scalarised returns. This final inequality is also the definition of a Nash equilibrium under ESR. Given this positive result, it is alluring to believe that the inverse, so going from ESR to SER, would also hold. However, this is not the case as we can only guarantee that the utility of a pure strategy profile is greater or equal to the optimal stochastic mixture of scalar utilities. We can not guarantee that it is better than the utility of the optimal stochastic mixture of reward vectors.
Consider a (finite, n-player) multi-objective normal-form game with a pure strategy Nash equilibrium under the expected scalarised returns criterion. This joint pure strategy need not be a Nash equilibrium under the scalarised expected returns criterion.
We show this theorem by using the same MONFG and utility functions as presented in the proof for Theorem 6 and 7. Recall that in this game, the pure strategy profile (A, A) was a PSNE under ESR. This joint strategy was no PSNE under SER, as the best-response to a player opting for A was to play A with probability and B with probability . ∎
We add that an additional assumption can be made to remedy this negative result. Concretely, by making the assumption that all utility functions used by the players in the game are quasiconvex, we now show that a PSNE under ESR must also be a PSNE under SER. Note that contrary to Theorem 1, we do not need to restrict our utility functions to be continuous.
Consider a (finite, n-player) multi-objective normal-form game with a pure strategy Nash equilibrium under the expected scalarised returns criterion. If all players have a quasiconvex utility function, this joint pure strategy must necessarily also be a Nash equilibrium under the scalarised expected returns criterion.
We first highlight that the construction created in the proof of Theorem 10 uses a non-quasiconvex utility function. We observe this by applying the definition of quasiconvexity from Definition 11 to this utility function. If we take for example , and , then we get for the left hand side and for the right hand side. It is clear then that this is not a quasiconvex utility function, as is larger than . We now present the proof for Theorem 11.
Given a pure strategy Nash equilibrium under ESR and for each player a quasiconvex utility function :
|A pure strategy Nash equilibrium under SER|
Here too, we present a textual walk through of the proof. Initially, we introduce the definition of a Nash equilibrium under ESR. In line two, we observe that because we are dealing with pure strategies, we may eliminate the expectation from the left hand side following Lemma 8. In the right hand side, we state that the utility of this PSNE must equal the utility of the best-response pure strategy. This follows directly from the definition of a (pure strategy) Nash equilibrium.
Given the extension of Jensen’s inequality [jensen1906sur] to quasiconvex functions [dragomir2012jensen], line three and four denote that with the assumed quasiconvex utility functions any convex combination of payoff vectors is bounded by the maximum utility of one such vectors. Line five subsequently writes this right hand side as the SER of any alternative strategy . In line six, we reintroduce the left hand side from earlier, while keeping the inequality intact. Line seven removes the maximum, after which we add the expectation into the left hand side. By doing this, we have arrived at the definition of a NE under SER, proving Theorem 11.
From Theorem 9 and 11, we are able to present our final result of this section, observing that when assuming only quasiconvex utility functions for players in an MONFG, the set of PSNE under SER is equal to the set of PSNE under ESR. We state this formally in Theorem 12.
Consider a (finite, n-player) multi-objective normal-form game where each player has a quasiconvex utility function. The set of pure strategy Nash equilibria under the expected scalarised returns criterion is equal to the set of pure strategy Nash equilibria under the scalarised expected returns criterion.
This theorem has important implications. Recall that an MONFG under the ESR criterion can always be reduced to a NFG [radulescu2020utility]. We now know that in an MONFG where all players have a quasiconvex utility function, the set of PSNE under both criteria is equal. This indicates that if all utility functions are quasiconvex, we can retrieve pure strategy Nash equilibria for any multi-objective game by looking at the trade-off game. This equivalence lends itself to a novel computational method for retrieving all pure strategy Nash equilibria in multi-objective games. We show this approach in Section 7.
In addition, Theorem 12 allows us to provide additional existence guarantees for Nash equilibria in MONFGs. Specifically, for any class of NFGs for which the set of PSNE is guaranteed to be non-empty, we can subsequently guarantee a PSNE for any MONFG whose scalarisation with quasiconvex utility functions leads to a game in that original class. Such guarantees for NFGs exist, with for example results shown by [rosenthal1973class].
6 Blended Settings in MONFGs
In this section, we briefly move away from comparing the Nash equilibria in MONFGs under both criteria and discuss what happens when allowing for a heterogeneous set of players optimising for different criteria, i.e. some players optimising for the ESR criterion, while others are optimising for SER. We refer to this setting as a blended setting and to players in such a setting as a blended set of players. Lastly, we refer to the distribution of players to denote the proportions of players optimising for either SER or ESR.
Such types of settings are interesting to study, as they are sure to arise in the real world. To provide a practical example, consider the scenario of a young couple attempting to buy their first home using a real estate agent [radulescu2021decision]. In this scenario, the couple is almost surely interested in optimising for the expected utility (ESR), as buying a home is not an action one typically repeats many times. The real estate agent on the other hand might care more about the utility of the expected returns (SER), as their job depends on the successful sale of multiple houses. While this setting is surely worth attention, it is left almost completely unexplored and presents an interesting direction for future work [radulescu2020multi].
We first contribute a novel definition for a Nash equilibrium in a blended setting below. Intuitively speaking, this definition states that no agent can improve on their individual criterion by unilaterally deviating from the joint strategy.
Definition 13 (Nash equilibrium in a blended setting).
A joint strategy leads to a Nash equilibrium in a blended setting if for each agent optimising for the ESR criterion and each agent optimising for the SER criterion so that and for all alternative strategies and :
i.e. is a Nash equilibrium in a blended setting if no agent optimising for the ESR criterion can increase the expected utility of its payoffs and no agent optimising for the SER criterion can increase the utility of its expected payoffs by deviating unilaterally from .
In a best case scenario, it would be possible to calculate or learn Nash equilibria for any given MONFG with a blended set of players without knowing the distribution of these players. Specifically, this would mean that the game is robust to any changes in the player distribution and shows some static characteristics. Luckily, we are able to use the theorems provided in Section 5 to derive several such properties. We first observe that if a PSNE exists in an MONFG where every player is optimising for the SER criterion, it is also guaranteed to be a PSNE in any blended setting. Formally:
Consider a (finite, n-player) multi-objective normal-form game where players are optimising for the scalarised expected returns criterion. A pure strategy Nash equilibrium in this setting is also a Nash equilibrium in any blended setting.
The proof follows directly from Theorem 9. Starting with this theorem, we know that any pure strategy that is a Nash equilibrium under SER is also a NE under ESR. Given an MONFG with a PSNE under SER, players optimising for this criterion in a blended setting have no incentive to deviate. As it is also a PSNE under ESR, the players optimising for this criterion do not have an incentive to deviate either. As such, this strategy presents a PSNE in any blended setting.
Note that following the same logic, we also guarantees that any PSNE in a blended setting is a PSNE in the trade-off game. Observe that PSNE are not altered by players shifting from the SER criterion to ESR, as it already was a PSNE under ESR. In addition to this we can use the theorems from Section 5 to show that if a PSNE exists in a game scalarised using quasiconvex utility functions, it must necessarily also be a PSNE in any blended setting. We formalise this in Corollary 14.
Consider a (finite, n-player) multi-objective normal-form game where each player has a quasiconvex utility function. A pure strategy Nash equilibrium under the expected scalarised returns criterion is also a Nash equilibrium in any blended setting.
From Theorem 12, we know that when assuming only quasiconvex utility functions, a PSNE under ESR is necessarily also a PSNE under SER. Thus, by the same logic used in the previous proof, no player can improve on their particular criterion by deviating. As such, we have a pure strategy Nash equilibrium in any blended setting.
We can use this corollary in a final theorem. Intuitively, we observe that the set of PSNE in any blended setting with players employing quasiconvex utility functions is equal to the set of PSNE in the trade-off game. Formally:
Consider a (finite, n-player) multi-objective normal-form game where each player has a quasiconvex utility function. The set of pure strategy Nash equilibria in the trade-off game is equal to the set of pure strategy Nash equilibria in any blended setting.
This follows directly from Theorem 12 and Corollary 14. First, Corollary 14 guarantees that a PSNE under SER is also a PSNE in any blended setting. By extension, this shows that a PSNE in a blended setting is also a PSNE in the trade-off game. Theorem 12 subsequently guarantees that these PSNE are also PSNE under SER.
The results presented in this section have several important implications. Specifically, we now guarantee that in certain situations we can determine pure strategy Nash equilibria for blended settings a priori, without even knowing the distribution of players optimising for either criterion. In particular, Theorem 15 shows that when players are employing quasiconvex utility functions, the set of PSNE in a blended setting corresponds to that of the trade-off game. In the following section, we use this property to derive a general algorithm for calculating PSNE in MONFGs under ESR, SER or indeed any blended setting given quasiconvex utility functions.
7 Algorithmic Implications
As a final contribution, we propose a novel algorithm for finding all pure strategy Nash equilibria in finite n-player MONFGs, given quasiconvex utility functions. From this point onward, we will only consider such utility functions when discussing the algorithm. We highlight that the approach laid out in this section will be shown to operate correctly for any optimisation criterion or blended setting. We provide a pseudocode implementation in Algorithm 1. A functioning implementation can be found at https://github.com/wilrop/Nash-Equilibria-MONFG.
The algorithm is designed in two sequential components. First, we use the fact that when optimising for the ESR criterion in an MONFG, the game can be trivially reduced to a single-objective NFG as shown by [radulescu2020utility]. The input MONFG is thus scalarised using the provided utility functions following this construction method. Specifically, given an MONFG , we can construct a single-objective NFG where and are the same and the payoff function for the NFG is . We subsequently define each as the composition between player ’s utility function and their vectorial payoff function . This construction is defined in the function reduce_monfg from line 1 to 6.
Second, from this reduction it follows that we can use algorithms from single-objective NFGs to retrieve all PSNE in the original MONFG. If one or more of such a PSNE exists, this would further imply that the same strategy is a PSNE in the MONFG under ESR [radulescu2020utility]. By Theorem 11 we then also have a PSNE under SER and by Corollary 14 we also have a PSNE in any blended setting. Lastly, we know by Theorems 12 and 15 that the set of PSNE retrieved by this method necessarily contains all PSNE in the MONFG, irrespective of the player distribution. We implement a straightforward method for finding all PSNE in an NFG in the function find_all_PSNE from line 8 to 16.
The algorithm sequentially goes through these two parts. We first call the reduction in line 17, after which we retrieve all PSNE in line 18. One important benefit of this approach is that the complexity does not increase when including more objectives. Instead, only the runtime is increased when performing the initial scalarisation for more objectives. This makes a game with objectives effectively as hard to solve as a game with 2 objectives.
To find all pure strategy Nash equilibria in the single-objective NFG we use the function find_all_PSNE. In Algorithm 1, we show the classical "underlining" approach where we enumerate all joint pure strategies and check which are best-responses to each other. We highlight that when the induced single-objective game has a specific form, one could use a tailored algorithm for finding all pure strategy Nash equilibria. For example, the class of generic normal-form games has an algorithm capable of efficiently retrieving all Nash equilibria, which can be adapted to search for PSNE first [herings2005globally]. We add that parallelism could also be employed to further speed up the required computations.
Lastly, we note the fact that Algorithm 1 can be trivially modified to find a single sample pure strategy Nash equilibrium, rather than all PSNE. This can be achieved by first performing the initial reduction to a single-objective NFG and subsequently applying a method for finding a sample PSNE rather than all PSNE. Porter et al. [porter2008simple] propose such an algorithm that first checks for PSNE and as such can be modified to stop after finding one or having excluded all pure strategies.
8 Related Work
Multi-objective normal-form games were introduced by [blackwell1954analog] and have since been considered with many different approaches. We highlight below several important works and frame their contributions in terms of similarities and differences to the utility-based approach we employ in this article.
Early studies on multi-objective games, often referred to as multi-criteria games, largely focused on arguing its importance and extending relevant solution concepts from single-objective game theory to this setting [blackwell1954analog, shapley1959equilibrium, zeleny1975games]. Most works considered the case in which agents do not know their utility function, and thus define utility function agnostic equilibria. [shapley1959equilibrium] introduce the now widely used concept of Pareto Nash equilibria. They extend and characterise the set of mixed-strategy agnostic Pareto-Nash equilibria for multi-objective two-person zero-sum games for linear utility functions. We stress here the apparent disconnect between the proposed solution concept and presented characterisation. Specifically, mixed strategy Pareto Nash equilibria are defined as strategies that result in non-dominated vectors. However, by their definition this implies that we are considering expected returns thus leading us to the SER criterion in the utility-based approach. When subsequently scalarising the game, utility functions are conveniently restricted to be linear only, as this allows for discussion of these points in the equivalent single-objective game. As such, they implicitly shift to the expected utility or ESR in the utility-based framework. Note that this final scalarisation is only mathematically sound when restricting to linear utility functions [radulescu2020utility].
We highlight that such conflicting approaches are regularly seen in multi-objective game theory. On the one hand, often a utility-agnostic approach is assumed and expected vectors (i.e. SER) are considered [shapley1953stochastic, voorneveld2000ideal, ismaili2018existence]. On the other hand, linear utility functions are assumed in order to facilitate a reduction to an equivalent single-objective game (i.e. ESR) and allow claims to be made about the resulting equilibria [corley1985games, zeleny1975games, lozovanu2005multiobjective]. Note that under such linear utility functions, ESR and SER are in fact equivalent [radulescu2020utility].
There has also been work on formulating algorithms for finding Pareto-Nash equilibria in multi-objective non-cooperative games. [lozovanu2005multiobjective] propose a method that computes the trade-off game (i.e., implicitly assume the ESR criterion) for every linear utility function for which the weights sum to one and subsequently find its NE. In addition, recent work by [ismaili2018existence] highlights the specific properties of pure strategy Pareto-Nash equilibria when considering expected payoff vectors and provides an algorithm for computing these equilibria.
In this work we assume a utility-based perspective. This approach is introduced by [roijers2013survey] who subsequently make the distinction between the ESR and SER criterion explicit and highlights their differences. While their work focused mostly on the single-agent case, a recent survey by [radulescu2020multi] assumes this approach in multi-objective multi-agent settings. They further offer a taxonomy of such decision making problems on the basis of payoffs, utility and the type of desired outcomes.
Conforming to this taxonomy, in this paper we focus on individual utility, i.e., even if agents receive the same payoff vector they may value this payoff vector differently. Furthermore, we assume that no social welfare mechanism is employed, and that we are looking for stable outcomes in settings with self-interested agents [radulescu2020multi]. This approach is also assumed by [radulescu2020utility], who provide several foundational results that we build upon in this article. First, they show that an MONFG under the ESR criterion can always be reduced to a NFG and that when using linear utility functions, an MONFG can always be scalarised a priori. Furthermore, they formally show that under SER no NE need necessarily exist.
As previously mentioned, throughout most earlier work the (implicit) assumption is made that utility functions are linear. Nonetheless, it is a well-known fact that utility functions can be highly non-linear. This has been noted in a variety of scenarios ranging from queueing games [breinberg2017equilibrium] to travel choice behaviour [koppelman1981nonlinear]. [bergstresser1977domination] bring up the idea that utility functions could also be non-linear in multi-objective normal-form games. However, in their practical analysis, they only consider linear utility functions and apply the ESR criterion to obtain the resulting trade-off game and corresponding solution points. Recent work in MONFGs often explores reinforcement learning techniques to find optimal solutions for agents in such settings when operating under non-linear utility functions [radulescu2021opponent, ropke2021communication]. Lastly, in settings where the utility functions are not easily expressed, it can prove interesting to explore different elicitation strategies. One approach that has proven successful examined the use of Gaussian processes to elicit user preferences by asking the user to rank items or perform simple pairwise comparisons [zintgraf2018ordered, roijers2021interactive].
9 Conclusion and Future Work
In this article, we explored the theoretical foundations of normal-form games with vectorial payoffs, also referred to as multi-objective normal-form games. To this end, we assumed a utility-based approach that guarantees the existence of a utility function that can be used to scalarise a payoff vector [roijers2013survey]. A known complicating factor is that such a scalarisation can occur at two stages, specifically when considering the utility of mixed strategies. We refer to these approaches as optimisation criteria, as agents are rational actors that wish to optimise for their particular criterion. The first option is to scalarise the payoff vectors for each joint action in the payoff matrix directly and calculate the expected utility of mixed strategies, also referred to as the expected scalarised returns criterion (ESR). On the other hand, we can also calculate the expected payoff vector with regards to the mixed strategy and calculate the utility of this expected vector. This criterion is also known as the scalarised expected returns criterion (SER). Earlier work showed that in this latter approach, no Nash equilibrium needs to exist [radulescu2020utility].
We first prove that the existence of a mixed strategy Nash equilibrium can be guaranteed in the setting of MONFGs under SER when we restrict agents to employ only continuous quasiconcave utility functions. On the other hand, restricting players to the class of strictly convex, implying quasiconvexity as well, does not guarantee a Nash equilibrium. Given that agents in multi-agent systems are often assumed or incentivised to play according to Nash equilibria, such guarantees will prove important in the development of future algorithms.
Next, we explored whether there exists a relationship in either the number of Nash equilibria or the equilibria themselves in MONFGs under both optimisation criteria. We found that even when an equilibrium exists under both criteria, the number of Nash equilibria can differ. In addition no Nash equilibria need necessarily be shared.
These negative results led us to restrict our focus to only pure strategy Nash equilibria (PSNE). We show that when only focusing on these equilibria, we can guarantee that the set of pure strategy Nash equilibria under SER equals that under ESR when assuming quasiconvex utility functions. When disregarding this assumption, we are only able to guarantee that a PSNE under SER is also a PSNE under ESR. Due to the generality of this result, we further investigated blended settings in which agents are allowed to optimise for different criteria. We contributed a novel definition of a Nash equilibrium for this setting and subsequently showed that when assuming only quasiconvex utility functions, the set of PSNE in any blended setting is equivalent to that of the trade-off game.
Our last contribution added a novel algorithm for finding all PSNE in any game with vectorial payoffs, irrespective of whether agents are optimising for ESR, SER or any blended settings. This algorithm first scalarises the input MONFG and subsequently calculates the PSNE in this game. Due to the previous theorems, we can guarantee the correctness of this approach.
For future work, we aim to focus on two possible directions. First, we propose to look further into the theoretical foundations of MONFGs. Given that quasiconcave utility functions guarantee Nash equilibria, it is interesting to explore what other assumptions on utility functions give rise to NE. In addition, it can prove interesting to study specific restrictions on payoff structures such as zero-sum games to find whether the assumptions on utility functions can be relaxed. Furthermore, we already addressed that blended settings are left almost completely unexplored. Due to the practical relevance of such settings, we aim to provide stronger theoretical guarantees in these scenarios as well.
The second direction for future work that we find deserving of attention is the algorithmic aspect. In this article, we contribute an initial algorithm for finding all PSNE in an MONFG when assuming quasiconvex utility functions. We believe that designing an algorithm that is able to do this for any type of utility function or even for mixed-strategy NE in general will prove a worthwhile pursuit. In single-objective NFGs, such approaches already exist [lemke1964equilibrium]. Furthermore, more efficient methods can be applied on subclasses of NFGs that contain specific properties such as a zero-sum payoff structure. Studying whether such methods apply in the setting of MONFGs can also prove important.
The first author is supported by the Research Foundation – Flanders (FWO), grant number 1197622N. This research was supported by funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program.