As the application of learning in multi-agent settings gains traction, game theory has emerged as an informative abstraction for understanding the coupling between algorithms employed by individual players (see, e.g., [fudenberg1998theory, mazumdar2018convergence, chasnov:2019aa]). Due to scalability, a commonly employed class of algorithms in both games and modern machine learning approaches to multi-agent learning is gradient-based learning, in which players update their individual actions using the gradient of their objective with respect to their action. In the gradient-based learning paradigm, continuous quadratic games stand out as a benchmark due to their simplicity and ability to exemplify state-of-the-art multi-agent learning methods such as policy gradient and alternating gradient-descent-ascent [mazumdar2019policy].
Despite the resurgence of interest in learning in games, a gap exists between algorithmic performance in simulation and physical application in part due to disturbances in measurements [shalev2017failures]. Robustness to environmental noise has been analyzed in a wide variety learning paradigms [li2019robust, bottou2010large]. Most analysis focuses on independent and identically distributed stochastic noise drawn from a stationary distribution.
In contrast, we study adversarial disturbance without any assumptions on its dynamics or bounds on its magnitude. Though some work exists on the effects of bounded adversarial disturbance in multi-agent learning [jiao2016multi], there is limited understanding of how gradient disturbance propagates through the network structure as determined by the coupling of the players’ objectives. Does gradient-based learning fundamentally contribute to or reduce the propagation of disturbance through player actions? Our analysis aims to answer this question for gradient-based multi-agent learning dynamics. The insights we gain provide desiderata to support algorithm synthesis and incentive design, and will lead to improved robustness of multi-agent learning dynamics.
Contributions. The main contribution is providing a novel graph-theoretical perspective for analyzing disturbance decoupling in multi-agent learning settings. For quadratic games, we obtain a necessary and sufficient condition, which can be verified in polynomial time, that ensures complete decoupling between the corrupted gradient of one player and the learned actions of another player, stated in terms of algebraic and graph-theoretic conditions. The latter perspective leads to greater insight on the types of cost coupling structures that enjoy disturbance decoupling, and hence, provides a framework for designing agent interactions, e.g., via incentive design or algorithm synthesis. Applied to LQ games, a benchmark for multi-agent policy gradient algorithms, we show that disturbance decoupling enforces necessary constraints on the controllable subspace in relation to the unobservable subspace of individual players. Applied to bilinear games, we show that disturbance decoupling enforces necessary constraints on the players’ payoff matrices.
Ii Related Work
We study gradient-based learning for –player quadratic games with continuous cost functions and action sets. Convergence guarantees for gradient-based learning are studied from numerous perspectives including game theory [fudenberg1998theory, ratliff:2016aa, chasnov:2019aa], control [shamma2005dynamic], and machine learning [zhou:2017aa, mazumdar2018convergence].
Convergence guarantees for gradient-based learning dynamics under stochastic noise are studied in [chasnov:2019aa, mazumdar2018convergence, zhou:2017aa]. Despite being an important property to understand for adversarial disturbance, how non-stochastic noise propagates through the player network has no guarantees.
Our analysis draws on geometric control [trentelman2012control, wonham1974linear, dion2003generic]. In [trentelman2012control], algebraic conditions for disturbance decoupling within a single dynamical system is given. In [dion2003generic], disturbance decoupling for a single structured dynamical system is studied with frequency-based techniques. In this paper, we provide both the algebraic and graph-theoretic conditions for disturbance decoupling of coupled dynamical systems in gradient-based multi-agent learning.
Iii Continuous games and the game graph model
Let denote the index set where . For a function with , is the partial derivative with respect to .
Consider an -player continuous game where for each , with is player ’s cost function and is the joint action space, with denoting player ’s action space and . Each player’s goal is to select an action to minimize its cost given the actions of all other players. That is, player seeks to solve the following optimization problem:
One of the most common characterizations of the outcome of a continuous game is a Nash equilibrium.
Definition 1 (Nash equilibrium).
For an –player continuous game , a joint action is a Nash equilibrium if for each ,
Iii-a Gradient-based learning
We consider a class of simultaneous play, gradient-based multi-agent learning techniques such that at iteration , player receives from an oracle to update its action as follows:
where is player ’s step size,
is player ’s gradient evaluated at the current joint action and affected by a player-specific, arbitrary additive disturbance . In the setting we analyze, can modify to any other action within .
Under reasonable assumptions on step sizes—e.g., relative to the spectral radius of the Jacobian of in a neighborhood of a critical point—it is known that the undisturbed dynamics converge [mazumdar2018convergence, chasnov:2019aa]. While such a guarantee cannot be given for arbitrary disturbances as considered in this paper, we provide conditions under which a subset of players still equilibriates and follows the undisturbed dynamics.
Iii-B Quadratic games
For an –player continuous game , behavior of gradient-based learning around a local Nash equilibrium can be approximated by linearizing the learning dynamics, where the linearization corresponds to a quadratic game.
Definition 2 (Quadratic game).
For each , is defined by
Quadratic games encompass potential games [monderer1996potential] with , and zero sum games [gillies1959solutions] with . We give further examples of quadratic games in Section III-D.
Iii-C Game graph
To highlight how an individual player’s action updates depend on others’ actions, we associate a directed graph to the gradient-based learning dynamics defined in (2).
We consider a directed graph , where is the index set for the nodes in the graph, and is the set of edges. Each node is associated with action of the player. A directed edge points from to and has weight matrix , such that if element-wise. For each node , we assume the self loop edge always exists and has weight . The composite matrix with entries is the adjacency matrix of the game graph.
On a game graph, we define a path as a sequence of nodes connected by edges. The set of paths includes all paths starting at and ending at , traversing nodes in total. For a path , we define its path weight as the product of consecutive edges on the path, given by .
In the absence of disturbances , the update in (2) for a quadratic game reduces to
where , , , and .
Iii-D Subclasses of games within quadratic games
To both illustrate the breadth of quadratic games and provide exemplars of the game graph concept, we describe two important subclasses of games and their game graphs.
Iii-D1 Finite horizon LQ game
Given initial state and horizon , each player in an -player, finite-horizon LQ game selects an action sequence with in order to minimize a cumulative state and control cost subjected to state dynamics:
The LQ game defined by the collection of optimization problems (6) for each is equivalent to a one-shot quadratic game in which each player selects with , in order to minimize their cost defined by
where is the joint action profile, and the cost matrices are given by ,
and . This follows precisely from observing that the dynamics are equivalent to where . From here, it is straight forward to rewrite the optimization problem in (6) as . The LQ game is a potential game if and only if and for all .
LQ Game Graph. Suppose each player uses step size . Since, is given by
the learning dynamics (5) are equivalent to
where , with a blockwise matrix having entries if and otherwise.
Iii-D2 Bilinear games
Bilinear games are an important class of games. For instance, a number of game formulations in adversarial learning have a hidden bilinear structure [vlatakis2019poincare]. In evaluating and selecting hyper-parameter configurations in so-called test suites, pairwise comparisons between algorithms are formulated as bimatrix games [balduzzi2018re, balduzzi2020smooth].
Formally, a two player bilinear game111The bilinear game formulation and corresponding game graph for different gradient-based learning rules easily extend to an -player setting, however the results in Sec. IV are presented for two player games., a subclass of continuous quadratic games, is defined by and where and and . Common approaches to learning in games [vlatakis2019poincare, bailey2019finite], simultaneous and alternating gradient descent both correspond to a linear system.
Game graph for simultaneous gradient play. Players update their strategies simultaneously by following the gradient of their own cost with respect to their choice variable:
The simultaneous gradient play game graph is given by
Game graph for alternating gradient play. In zero-sum bilinear games, it has been shown that alternating gradient play has better convergence properties [bailey2019finite]. Alternating gradient play is defined by
Examining the second player’s update, we see that . The game graph in this case is defined by
Iv Disturbance Decoupling on Game Graph
In this section, we derive the necessary and sufficient condition that ensures decoupling of gradient disturbance from the learning trajectory of a subset of players. We emphasize that the condition holds for disturbances with arbitrary magnitudes and functions. This is a useful result because it provides guarantees on both the equilibrium behavior and the learning trajectory under adversarial disturbance.
Definition 3 (Complete disturbance decoupling).
Given initial joint action , game costs , step sizes , suppose that player ’s gradient update is corrupted as in (3), then for player , action is decoupled from the disturbance in player ’s gradient if the uncorrupted and corrupted dynamics, given respectively by
result in identical trajectories for player when . That is, holds for all , , where
Iv-a Algebraic condition
We first derive an algebraic condition on the joint action space for disturbance decoupling. Define and let denote the image of .
Consider an -player quadratic game as in Definition 2 under learning dynamics as given by (2), where player experiences gradient disturbance as given by (3). Let be the joint action subset. For player , the following statements are equivalent:
Player is disturbance decoupled from player .
, , .
, , where and are matrices such that and .
For a quadratic game , the learning dynamics without and with disturbances reduce to the equations in (14). Given initial joint action ,
Then, Definition 3 is equivalent to satisfied for and . Since the condition holds for all , it is equivalent to for all and . This is then equivalent to for all and . To see this equivalence, consider the following result from Cayley-Hamilton theorem, for some . Thus, for and any , where for , which implies that . This concludes the equivalence.
Finally, we note that is a restatement of . Furthermore, can be verified in polynomial time. ∎
In connection to geometric control theory, condition of Proposition 1 is equivalent the fact that , the smallest -invariant subspace containing , must be a subset of [trentelman2012control, Thm 4.6].
Iv-B Graph-theoretic condition
Next we derive the graph-theoretic condition on the joint action space for disturbance decoupling.
The result follows from equivalence between Proposition 1 condition and (15). Note that is equivalent to for all , and is equivalent to for all . We prove the result by induction. For , holds if and only if . For , is equivalent to and . Suppose that for , is the sum of path weights over all paths of length , originating at and ending at , then is the sum of path weights over all paths of length , originating at and ending at . Let , then , where if and only if the sum of path weights of length from to is nonzero and there is an edge from to . Furthermore, is the sum of path weights over all paths of length from to each of which contains . Since we sum over , we conclude that is the sum of all paths weights of length from to , i.e., . ∎
The concept of disturbance decoupling is quite counter-intuitive: any change in player ’s action does not affect player ’s action, despite being implicitly dependent on through the network of player cost functions. As we see from the proof of Theorem 1, this situation arises when the dependencies ‘cancel’ each other out, i.e. the sum of path weights from to is always zero for equally lengthed paths.
Example 1 (Disturbance decoupled players).
Consider a player quadratic game where and the game graph is given by Figure 1. Edge weights , , , and , while each self loop has weight . Paths of length from player to player are enumerated as , , and , . To satisfy Theorem 1, the sum of path weights for each must be for . There are no paths of length one, summation for implies the criteria , and summation for implies the criteria . If , is necessary and sufficient for disturbance decoupling between player 1 and player 4.
Disturbance decoupling is a structural property of the game in terms of disturbance propagation and attenuation. An open research problem is linking this structural property to robust decision making under uncertainties in cost parameters , and step sizes .
The following corollary specializes to the class of potential games [monderer1996potential], which arise in many applications [paccagnan, lutati2014congestion, alpcan2010network].
In a potential game graph, . Therefore, a path with path weight exists from to if and only if a path with path weight exists from to . Therefore, (15) holds from player to player if and only if it holds from player to player . ∎
Consider an -player finite horizon LQ game as in (6) under learning dynamics as given by (9), where player experiences gradient disturbance as given by (3), if disturbance decoupling holds between player and gradient disturbance from player , then
If is positive definite and , the controllable subspace of must lie in the unobservable subspace of where , , and .
For player to be disturbance decoupled from player , edge cannot exist, i.e. from (7). Expanding , is given by . We unwrap these conditions starting from , ; in this case is necessary. Then we consider , which implies that is necessary. Subsequently, this implies that all is necessary for . Similarly, we note that and . From these we can use the rest of to conclude that for any . This condition is equivalent to (16). ∎
We apply Theorem 1 to two player bilinear games and prove a necessary condition for disturbance decoupling between different coordinates of each player’s action space that is independent of players’ step sizes.
Consider a two player bilinear game under learning dynamics (10) and (12), where coordinates and experience gradient disturbance as given by (3). If and coordinate is disturbance decoupled from coordinate , must satisfy , where and denote the elements of and , respectively. Similarly, if and coordinate is disturbance decoupled from coordinate , must satisfy .
We construct games played by players with actions and whose game graphs are identical to (11) and (13). First consider disturbance decoupling of from . In both learning dynamics, do not have any edges between players. Therefore, paths between and with length is given by . We sum path weights over to obtain for disturbance decoupling of from in (10) and (12). A similar argument follows for disturbance decoupling of from in (10). For disturbance decoupling of from in (12), we note that a edge from to exists with weight when . Disturbance decoupling requires , therefore . ∎
Consider a two player bilinear game under learning dynamics (10) and (12), where coordinates and experience gradient disturbance as given by (3). If coordinate is disturbance decoupled from coordinate , must satisfy and , where and denote the elements of and , respectively. If coordinate is disturbance decoupled from coordinate , must satisfy and .
We construct games played by players with actions and whose game graphs are identical to (11) and (13). In both learning dynamics, disturbance decoupling requires no direct path between the decoupled players. Therefore or .
Consider disturbance decoupling of from in (10), paths of length from to without self loops is given by , . A path of length with self loops must also include , whose weight is . We sum path weights over to obtain . A similar argument is made for disturbance decoupling of from in (10).
Consider disturbance decoupling of from in (12), paths of length from to without self loops is given by . A path of length with self loops must also include , whose weight is . Weight of is given by . We sum path weights over to obtain . A similar argument is made for disturbance decoupling of from in (12). ∎
V Numerical Example
We provide an example of disturbance decoupling in a LQ game. Consider a tug-of-war game in which a single target is controlled by four players. We assume that player can move
along vectorby , and that is stationary without any player input, i.e., . Starting with a randomized initial condition , at each step , the target moves according to the dynamics where , , , . Each player ’s cost function is given by
which describes player ’s objective to move target towards in a finite time by using minimal amount of control. By designing the game dynamics to satisfy Theorem 1, we ensure that player ’s action is disturbance decoupled from player ’s.
Using the equivalent formulation as described in Section III-D1, where . Hence, the learning dynamics are , where with , , , , and
To ensure convergence of the undisturbed learning dynamics [chasnov:2019aa], we use uniform step sizes such that with , where and with and
denoting the maximum and minimum eigenvalues of their arguments, respectively. The associated game graph is given in Figure1, where and . A path of length must have path weight