DeepAI
Log In Sign Up

Principal Trade-off Analysis

This paper develops Principal Trade-off Analysis (PTA), a decomposition method, analogous to Principal Component Analysis (PCA), which permits the representation of any game as the weighted sum of disc games (continuous R-P-S games). Applying PTA to empirically generated tournament graphs produces a sequence of embeddings into orthogonal 2D feature planes representing independent strategic trade-offs. Each trade-off generates a mode of cyclic competition. Like PCA, PTA provides optimal low rank estimates of the tournament graphs that can be truncated for approximation. The complexity of cyclic competition can be quantified by computing the number of significant cyclic modes. We illustrate the PTA via application to a pair of games (Blotto, Pokemon). The resulting 2D disc game representations are shown to be well suited for visualization and are easily interpretable. In Blotto, PTA identifies game symmetries, and specifies strategic trade-offs associated with distinct win conditions. For Pokemon, PTA embeddings produce clusters in the embedding space that naturally correspond to Pokemon types, a design in the game that produces cyclic trade offs.

READ FULL TEXT VIEW PDF

page 4

page 6

page 7

page 8

page 9

page 15

06/03/2019

Mind the Gap: Trade-Offs between Distributed Ledger Technology Characteristics

While design decisions determine the quality and viability of applicatio...
06/25/2020

Practical Trade-Offs for the Prefix-Sum Problem

Given an integer array A, the prefix-sum problem is to answer sum(i) que...
03/27/2019

A Conceptual Framework for Assessing Anonymization-Utility Trade-Offs Based on Principal Component Analysis

An anonymization technique for databases is proposed that employs Princi...
09/29/2022

Using models of baseline gameplay to design for physical rehabilitation

Modified digital games manage to drive motivation in repetitive exercise...
10/01/2020

EigenGame: PCA as a Nash Equilibrium

We present a novel view on principal component analysis (PCA) as a compe...
11/13/2019

Fast Approximate Time-Delay Estimation in Ultrasound Elastography Using Principal Component Analysis

Time delay estimation (TDE) is a critical and challenging step in all ul...
07/17/2020

A Unifying Perspective on Neighbor Embeddings along the Attraction-Repulsion Spectrum

Neighbor embeddings are a family of methods for visualizing complex high...

1 Introduction

In recent years algorithms have achieved superhuman performance in a number of complex games such as Chess, Go, Shogi, Poker and Starcraft [silver2018general, heinrich2016deep, moravvcik2017deepstack, Grandmaster]. Despite impressive game play, enhanced understanding of the game is typically only achieved by additional analysis of the algorithms game play post facto [chess_new_light]. Current work overemphasizes the “policy problem", developing strong agents, despite growing demand for a task theory which addresses the “problem problem", i.e. what games are worth study and play [omidshafiei2020navigating, clune2019ai]. A task theory requires a language that characterizes and categorizes games, namely, a toolset of measures and visualization techniques that evaluate and illustrate game structure. Summary visuals and measures are especially important for complex games where direct analysis is intractable. Then, tournaments are used to sample the game and to empirically evaluate agents. The empirical analysis of tournaments has a long history, in sports analytics [lewis2004moneyball, bozoki2016application] , ecology and animal behavior [Laird, silk1999male], and biology [stuart2006multiple, Sinervo]. While the primary interest in these cases is typically in ranking agents/players, tournament graphs also reveal significant information about the nature of the game being played [tuyls2018generalised]

. This paper describes mathematical techniques for extracting a wealth of information about the underlying game structure directly from tournament data. While these methods can be applied to the various contexts in which tournaments are already employed in machine learning (e.g., population based training), they open up a range of new research questions regarding the characterization of natural games, synthesis of artificial games (c.f. 

[omidshafiei2020navigating]), approximating games with simplified dynamics, and the strategic perturbation of games.

Fine structural characteristics of a tournament graph can be represented by low dimensional embeddings that map competitive relationships to embedded geometry. We review and expand on methods introduced by [balduzzi2018re], who proposed a canonical series of maps that provide a complete description of a sample tournament in terms of a sum of simple games, namely, disc games.

Our contribution follows. First, we compare PCA [pearson1901liii]

to disc game embedding, and show that disc game embedding inherits key algebraic properties responsible for the success of PCA. Based on this analogy, we propose PTA as a general technique for visualizing data arising from competitive tasks or pairwise choice tasks. Indeed, while we focus on games for their charisma, any data set representing a skew-symmetric comparison of objects is amendable to PTA. We show that PTA provides a much richer framework for analyzing trade-offs in games than previously demonstrated via a series of examples. Our examples exhibit a wide variety of strategic structures that can be clearly visualized with PTA. Unlike previous work, we focus on the relation between embedding coordinates, which represent performance relations, and underlying agent attributes in order to enumerate the principal trade-offs responsible for cyclic competition in each game. We consider the full information content of PTA by analyzing multiple leading disc games, and by studying the decay in their importance. Important strategic tradeoffs can arise in later disc games, so previous empirical work’s focus on the leading disc game is myopic. These examples also raise conceptual limitations not addressed in previous work, thus outline future directions for development.

2 Related Work

Our work builds directly on [balduzzi2018re], which used the embedding approach to introduce a comprehensive agent evaluation scheme. Their scheme uses the real Schur form (PTA) in conjunction with the Hodge decomposition to overcome deficiencies in standard ranking models. Our work also compliments efforts to explore cyclic structures in competitive systems [Candogan, HHD], economics [linares2009inconsistent, may1954intransitivity], and tangentially as multi-class classification problems [bilmes2001intransitive, huang2006generalized]. Cycles challenge traditional gradient methods and can slow training [omidshafiei2020navigating, balduzzi2018mechanics]. Moreover, cyclic structures in games are often intricate and difficult to disentangle, particularly among intermediate competitors. Games of skill frequently exhibit this “spinning top geometry" [czarnecki2020real]. By summarizing cyclic structures, PTA helps identify areas of the strategy space that cause difficulty during training, or should be targeted for diverse team design [balduzzi2019open, garnelo2021pick]. In particular, we show that PTA can identify fundamental trade-offs that summarize otherwise opaque cyclic structure. Trade-offs play an important role in decision tasks and evolutionary processes outside of games, so general tools that isolate and reify trade-offs are of generic utility [omidshafiei2020navigating, tuyls2018generalised]. In that sense, our attempt to visualize game structure is in line with generic data visualization efforts, which aim to convert complicated data into elucidating graphics (c.f. [healy2018data, garnelo2021pick]).

3 Background

3.1 Functional Form Games

A two-player zero-sum functional form game, is defined by an attribute space and an evaluation function

that defines an advantage of one agent over another given their attributes. Agents in the game can be represented by their attribute vectors

, the entries of which could represent agent traits, strategic policies, weights in a neural net governing their actions, or more generally, any attributes that influence competitive behavior. The function is of the form . The value , quantifies the advantage of agent over with a real number. The evaluation function must be fair, that is, the advantage of one competitor over another should not depend on the order they are listed in. Consequently, must be skew symmetric, [HHD]. If we say that beats and the outcome is a tie if . The larger

the larger the advantage one competitor possesses over another. We do not specify how advantage is measured, since the appropriate definition may depend on the setting. Possible examples include expected return in a zero-sum game, probability of win minus a half, or log odds of victory. With a set of agents

, pairwise comparisons of all agents gives a evaluation matrix where . This matrix can be analyzed to study the structure of the game among those competitors, i.e. the resulting tournament.

3.2 Disc Games

The cyclic component of a tournament can be visualized using a combination of simple cyclic games [balduzzi2019open, balduzzi2018re]. The simplest cyclic functional form game is a disc game, which acts as a continuous analog to rock-paper-scissors (RPS) in two-dimensional attribute spaces. The disc game evaluation function is the cross product between competitors embedded attributes,

(1)

where is the 2 2 ninety degree rotation matrix [balduzzi2018re]. The cross product models a basic trade-off between the two attributes. Any evaluation matrix can be represented with a sum of pointwise embeddings into a sequence of disc games. The necessary construction is reviewed in the next Section.

4 Principal Trade-off Analysis (PTA)

Any real, , skew-symmetric matrix admits a Schur decomposition (real Schur form), . Here is an orthonormal matrix, is block diagonal with , blocks of the form , and where is a nonnegative scalar. Each pair of consecutive columns,

correspond to the real and imaginary parts of an eigenvector of

scaled by . The scalars

are the nonnegative imaginary part of the corresponding eigenvalues, listed in decreasing order

[youla1961normal, zumino1962normal]. A simple linear algebra excercise demonstrates that the columns of are also proportional to the singular vectors of , and the sequence of scalars

match the singular values of

, thus a truncated expansion of using only the first blocks is equivalent to the optimal rank approximation to under the Frobenius norm. See Appendix A

When is replaced with the performance matrix , each block in the Schur decomposition acts as a scaled version of a disc game where each competitor is assigned embedding coordinates via . The transitive component can be represented on a line via the ratings, so does not require additional visualization. The cyclic component, , is given by where is the transitive component. All three of these matrices are skew symmetric, so admits a Schur decomposition.

(2)

As in PCA, we consider a low rank approximation of associated with expansion onto the first disc games, where is chosen large enough to satisfy a desired reconstruction accuracy. The optimal rank approximation for is given by replacing with , and with in Equation 2, where is the first columns of , and is the upper by minor of . Optimality is measured using Frobenius norm error.

The matrix provides a set of basis vectors. Projection onto those basis vectors define a new set of coordinates, thereby embedding the competitors. Specifically, let:

(3)

and scale each pair of embedding coordinates by the associated eigenvalue so . Then maps from competitor indices, , to points in , and the set is a collection of planar embeddings, where is given by projection onto a feature plane spanned by and .

Note that the Schur decomposition is only unique up to rotation within each feature plane, since complex conjugate pairs of eigenvectors of are only uniquely defined up to their complex phase. Thus we consider two embeddings equivalent if they agree up to rotation within each planar embedding.

The evaluation between agent and agent equals a sum over each embedding , of the cross product Appendix B. That is:

(4)

Thus, restricted to each planar embedding is a disc game and the optimal rank approximation of , is a linear combination of disc games applied to the sequence of planar embeddings .

This decomposition is useful for two reasons. First, it depends on a spectral decomposition of , so inherits the key properties that account for the success of PCA. An equivalent construction is introduced in [chen2016modeling] where it is called the “blade-chest-inner" model. The construction in [chen2016modeling] is not based on a spectral decomposition, so lacks orthogonality or low rank optimality.

In PTA, the embeddings are projections onto orthogonal planes, so each embedding encodes independent information about cyclic competition. The planes act like feature vectors, and is typically associated with some strategic trade-off (see Section 5

). Therefore, as PCA identifies principal components, PTA identifies principal trade-offs: orthogonal planes associated with a sequence of fundamental cyclic modes. The two decompositions differ since PCA uses the singular value decomposition, while PTA uses the Schur real form. Nevertheless, the sequence of embeddings form optimal low rank approximations to

, where the importance of each embedding is quantified by the associated eigenvalue. Thus, the sequence of eigenvalues determines the number of disc game embeddings, , required to achieve a sufficiently accurate approximation of . The number of disc games is half the effective rank of , and is a natural measure of the complexity of cyclic competition. The complexity is distinct from the overall intensity of cyclic competition or the game balance, which depend on instead of its rank [HHD]. Instead, it counts the number of distinct cyclic modes in the evaluation matrix. It is possible to have many distinct, yet weak cyclic trade-offs, or one, very strong, cyclic trade-off.

Second, the disc game construction encodes performance relations via geometry. Given coordinates and , the advantage of competitor over competitor given by equals the signed area of the triangle with vertices at the origin, , and . In polar coordinates, each point in a disc game has a radius and phase. The cross product equals the product of the radii, times the sine of the phase difference between and . So, the farther a competitor is embedded from the origin, the more intensely they are involved in the associated cyclic mode. For a fixed radius, one competitor gains the most advantage when it is embedded ninety degrees clockwise from its opponent, and possesses an advantage as long as it is embedded clockwise of its opponent. Thus, advantage flows clockwise about the origin. We visualize this flow with a circulating vector field. These geometric properties allow the sequence of disc games to encode a variety of cyclic structures in interpretable visuals. Our subsequent analysis relies heavily on these properties.

Figure 1: Disc games 1 to 3 of Blotto [1,1,1] game with = 75. Rows: disc game number. Columns 1 - 4: disc game embeddings colored according to agent rating, then agent allocation to zones 1 to 3. Column 5: the phase (measured counterclockwise from the horizontal axis), of the embedding of each strategy. Advantage flows clockwise in phase, so blue beats purple beats yellow beats green beats blue. Column 6: the radius of the embedding of each strategy. High radius corresponds to strong involvement in a trade-off (yellow), while small radius corresponds to low involvement (blue). Each triangle in the fifth and sixth columns represents the space of available allocations. High allocations to zone 1 cluster near the bottom left corner, high allocations to zone 2 cluster near the bottom right corner, and high allocations to zone three cluster at the top corner. Labels: Representative allocations defining the underlying trade-offs, indicated with bold arrows (see Table 1).

5 Experiments

Here, we illustrate the graphical power of PTA via Blotto and Pokemon. Both exhibit interesting cyclic structure. We emphasize the interpretation of each principal trade-off in terms of game strategy to show that PTA reveals diverse, fine-grained game structure based only on empirical game data.

5.1 Colonel Blotto

Colonel Blotto is a zero-sum, simultaneous action, two player resource allocation game [Blotto-Generalizations]. Each player possesses troops to distribute to zones. Each zone has an associated payout . A zone is conquered by a player if they allocate more troops to that zone than their opponent. The conquering player receives the payout. Ties result in both players receiving 0 payout. The player with the highest total payout wins the match. All allocations are revealed simultaneously.

At simplest, the payouts are uniform across zones, so the player who conquers the most zones wins the game. Unweighted Blotto is a highly cyclic game since there is no dominating strategy. Every strategy admits a counter. Unless or , all allotments lose to some other allotment. To defeat an allotment, adopt the maxim, “lose big, win small". Mimic the allotment, then redistribute all the units from the zone with the most units as uniformly as possible across the remaining zones. Then, unless all zones were allotted one unit, the exploiting strategy sacrifices a loss in one zone to win in more than one other zone. In general, the more an allotment commits to a single zone, the more easily it is defeated. Unweighted Blotto is also relatively complex, since the zones are indistinguishable. Thus unweighted Blotto admits fold symmetries with respect to the zone labels.

Introducing nonuniform weights breaks the exchange symmetry of the zones. This changes the set of possible win conditions, and subsequently the overall cyclicity, complexity, and strategic trade-off structures revealed by PTA.

We consider each unique strategy as a separate “agent”, parameterized by the corresponding allotment strategy. We generate agents by randomly sampling over the strategy space using a Dirichlet distribution with the support equal to the number of zones. After sampling, we compare each pair of strategies in the population. Each matchup is deterministic and results in a win, loss or tie, assigned scores (0.5,0,-0.5). We construct the associated evaluation matrix by setting to the score of strategy against strategy .

Blotto Example 1

Table 1 summarizes the principal trade-offs associated with each disc game. These trade-offs are the most important sources of cycles in the tournament, accounting for 80% of its structure.

Principal Trade-Offs
D.G. Allocation Types Location in Simplex Example Relation
1
corners
midpoints of edges
center
< < <
2
Corners
shifted
counter clockwise
< < <
3
Corners
shifted
clockwise
< < <
Table 1: Principal trade-offs associated with the first three disc games. The columns list the allocation types involved in each disc game (D.G.), their location in the simplex of possible allocations, provide an example set of allocations, and the competitive relations between the types. Note that the allocations involved in the trade-off defined by a disc game correspond to locations in the radius panels in Figure 1 shaded green or yellow. Advantage in a disc game flows clockwise, so can be inferred from the phase panel.

PTA allows elegant visualization of relevant game structure by reducing a game to a small set of key trade-offs. We start by looking at the = 3, = 75 blotto game with uniform payouts. In general, the number of distinct allotments in a battlefield, troop blotto game grows at , but the complexity, which reflects the underlying number of cyclic modes, converges to a constant value associated with a continuous Blotto game, where commanders can allocate an arbitrary fraction of their force to each zone. Unweighted = 3, = 75 blotto admits 2926 allotments, but has a 3! fold exchange symmetry under permutations of the battlefield labels, leaving roughly 488 distinct allotments. We require only 3 disc games to reconstruct the evaluation matrix to 80% accuracy, 6 at 90% accuracy, and 12 at 95% accuracy. Thus the game has complexity 12 at a 95% standard. Trade-offs 4 - 12 represent refinements of the trade-offs present in the first three disc games, so PTA really allows a reduction in complexity from 2926 allocations (absent prior knowledge regarding symmetries), to 3 fundamental cyclic modes. Thus, PTA can effectively separate the underlying complexity of a game from the size of its strategy space. See Appendix A for more discussion on complexity.

The exchange symmetry of the zones is apparent in the sequence of eigenvalues, , representing disc game importance. Exchanges introduce 6 permutations under which the evaluation matrix is invariant. Consequently, come in sets of three, where each represents a pair of eigenvectors. Eigenvectors associated with identical eigenvalues are not uniquely defined. Instead, they are drawn from a subspace of dimension equal to the multiplicity of the repeated eigenvalue. Consequently, all of the eigenvectors are chosen arbitrarily from six dimensional spaces.

When the evaluation matrix has repeated eigenvalues, the associated disc game embeddings are not uniquely defined. Any unitary transform of the set of eigenvectors sharing an eigenvalue defines a valid embedding. Thus, symmetry presents an unusual challenge; degeneracy. In our case, the disc games come in sets of three, each representing an arbitrary rotation of a six dimensional object. Consequently, we consider multiple disc games simultaneously. This issue was not addressed in previous work, which largely only considered a single dominant disc game. Generic games should not exhibit such strong symmetries, so such degeneracy will be rare, and likely confined to toy examples.

We analyze the three leading disc games to identify the most important allocation trade-offs. Figure 1 shows the first three disc games colored by rating, allocation to the three zones, and the mapping to phase and radius in each disc game as a function of allocation. Each share the same eigenvalue, so are equally important and could be mixed. Nevertheless, these three disc games represent distinct trade-offs in allocations that can be easily explained.

The specific trade-offs can be identified directly from the disc games when colored by allocation. Consider the points labelled 1, 2, and 3 in the first disc game. Each maximize the radius of the scatter cloud while moving along its boundary, so represent the allocations most involved in the cycle. The low rated points at the bottom of the scatter allocate primarily to one zone (yellow in panels 2 - 4).

Figure 2: The first nine disc games of the = 75, [1,1,1] blotto game. Each column is a separate disc game. The first row shows the phase assigned to each allocation. The second row shows the radius assigned to each allocation. The disc games are labelled by trade-off type. Consecutive sets of three disc game share the same eigenvalue and are grouped by the spatial scale of trade-offs represented in allocation space. The shared eigenvalues and percent recovery of are shown beneath each set.

Moving clockwise, the next extrema occurs at the top of the scatter. It is high rated, and has nearly equal allocation across all three zones (colored green in panels 2 - 4). Uniform allocations are rated highly since they perform well against most randomly sampled allocations, particularly those lying along a line connecting a corner of the simplex to its center. This induces a transitive trend among the bulk of the allocations moving from allocations that prioritize one zone, to allocations that treat the zones equally. This transitive trend is represented by the general shift of the disc game leftward off the origin. This subset of allocations compete transitively, producing the regular gradient from purple to yellow in rating when moving clockwise from the bottom to the top in the scatter.

Not all allocations satisfy this transitive trend. Allocations that prioritize two zones counter the uniform strategy, and are countered by allocations that prioritize a single zone. For example, allocation [70,0,5] defeats [38,37,0]. Thus, allocations lying on the midpoints of an edge of the simplex lose to allocations near either neighboring endpoint. These counters close the cycle, and are represented by the rightmost pair of corners labelled 2 in disc game 1. Panels 2 - 4 show that each such corner receives an intermediate allocation in two zones (green), but little to none in the third (dark blue).

Similar visual analysis identifies the RPS cycles among cyclic permutations of allocations [H,M,L] and [L,M,H] shown in disc games 2 and 3. For example, the leftmost corner of the scatter cloud shown in disc game 2 receives a high allocation in zone 1 (teal), an intermediate allocation in zone 2 (blue-green), and a low allocation in zone 3 (dark blue). Walking from R to P to S, the allocation patterns shifts cyclically. The same analysis applies to disc game 3, starting from [L,M,H].

Figure 2 shows the phase and radius assigned to each allocation in the simplex. Strikingly, subsequent disc games imitate the disc game 1-3 trade-offs, only at higher frequency on a smaller spatial scale in allocation. This suggests that the disc games may act like Fourier modes, where early disc games capture low frequency, global tradeoffs, and later disc games capture high frequency, local tradeoffs. It also suggests that orthogonality may not be the appropriate notion of independence for trade-offs. A sharper notion of equivalency is needed. Methods like nonnegative matrix factorization, which address similar issues among PCA features [lee1999learning], suggest an avenue for further develop. An example that produces explicit sine series is discussed in the Appendix.

Figure 3: The first three disc games of the N=45 [2,3,4] blotto game are shown. Each row is a separate disc game. The first column colors the game by ratings. Columns 2-4 by allocation 1-3. Boxed panels mark which zone allocation determines the phase in each disc game. The final column shows phase as a function of allocation, with arrows marking the advantage accrued by moving clockwise.

Blotto Example 2

Next, consider a weighted three zone example with weights . This weighting changes the win conditions. There are now two distinct types of win conditions, either, win any two zones, or tie on one zone and win in the higher valued of the remaining two. The former win condition is the same as in the unweighted case, and applies to the majority of allocations pairs. The latter case disallows ties, and breaks the exchange symmetry central to the previous example. As a result, the eigenvalues no longer come in repeated triples (though they still cluster in groups of three and decay at essentially the same rate). Therefore, each disc game is uniquely defined.

Figure 3 shows the first three disc games with the same coloring structure and phase plot structure as before. Note the cyclic structure of the disc games. Each scatter cloud forms an annulus around the origin, shifted so that it passes close to the origin on one side. In fact, the scatter patterns in the three disc games are roughly identical after rotation. The near symmetry of the first three disc games reflects the exchange symmetry in the outright win conditions for each zone. This demonstrates that PTA can discover fundamental game symmetries absent a priori knowledge.

Each disc game represents a trade-off associated with allocation to a single zone. This is most visible by looking diagonally from bottom left to top right in columns 2-4. The phase about the annulus in the first disc game is closely correlated with allotment into the third zone, as illustrated by the monotonic change in color from purple to yellow about the annulus. Similarly, phase in the second disc game is associated with allotment into zone two, and phase in the third disc game is associated with allotment into zone one. The order of this mapping is also consistent with the payoff structure. Zone three is more important than zone two, which is more important than zone one.

The correlation of phase with allotment is apparent in the phase plots provided in the last column. Phase in the first disc game completes one complete cycle moving down from the top corner of the simplex (exclusive allotment to zone 3) to the bottom edge (no allotment to zone 3). The same pattern repeats for the subsequent disc games, only with respect to a different zone.

As before, each disc game admits clear interpretation. Each annulus is shaped roughly like a capital “D”. In the disc game, strategies that overallot to the zone appear at the clockwise most corner of the “D”, so are the most easily exploited. Moving clockwise, ratings increase with an approach to uniform allotment, then decrease again as underallotment to the associated zone prevents uniform allotment. This, then, is the underlying trade-off. Overallotment ensures losses in other zones, underallotment ensures loss in the focal zone.

5.2 Pokemon

We conclude by analyzing Pokemon. Pokemon originated from the Nintendo Game Boy console, but has since been played on a variety of mediums including playing cards [pokemon-site]. Pokemon is of considerable interest from a game design perspective since the creators must design certain trade-offs to keep the game balanced and engaging. The game is made up of creatures, called Pokemon, that come in many varieties. Pokemon is interesting from a game design perspective, since the design should reward players for collecting diverse teams. Thus, each Pokemon has a different type, and each type has its own set of strengths and weaknesses. These different types satisfy interlocking cyclic relationships.

The data used in this analysis comes from an open-source Kaggle data set

[pokemon-data]

. The original data has 800 Pokemon, but we removed the 65 “legendary" Pokemon to simplify the analysis. The data consists of battle outcomes and pokemon attributes. Battle outcomes were converted into an evaluation matrix using logistic regression (see Appendix

E). Here, we apply the Schur decomposition directly to to show that disc game embedding can successfully isolate a dominant transitive component.

Figure 4: Disc games 1,2 and 4 for pokemon. Disc game one is colored by rating. Disc game two is colored by type, then generation. Disc game four is colored by rating, then generation.

Figure 4 shows three of the first four disc games, chosen for their significance. The first disc game is the most important, and is clearly transitive since all points fall on a curve that does not enclose the origin. Position along the curve is closely correlated with speed, so speed determines rating.

We query by attribute to interpret the remaining disc games. To start, consider the “type" attribute. The second disc game is clearly clustered by type (see Figure 4). A variety of RPS relationships are apparent among the type clusters. Any loop of clusters containing the origin corresponds to a cycle of type advantage. The intensity of the corresponding cycle (curl) is proportional the area of the convex hull formed by the clusters. Focus on the large clusters most involved in the trade-off, i.e. furthest from the origin. Figure 5 summarizes the RPS relations between these clusters. First, notice the highlighted triangle formed by the Water-Fire-Grass RPS relationship. The disc game shows the expected relationship among the three types since the triangle contains the origin. Thus, PTA successfully identifies known game structure without any domain specific knowledge.

Additional clusters on the outer ring satisfy more intricate relations. The other three types are “bug", “rock" and “ground". To summarize these relations we construct a coarse grained evaluation matrix, . Specifically is the average performance of Pokemon from type vs the Pokemon from type in the second disc game. The associated matrix heat-map is shown in the middle panel of Figure 5. The types are ordered by angle moving clockwise about the origin.

We compared these relations with available game design matrices known as “attack matrices". These matrices have a row and column for each Pokemon type, with each intersection listing the advantage of one Pokemon type over the other. We use the provided attack matrix from [pokemon-attack-matrix]. An attack matrix is written in terms of multiples, so Pokemon that are evenly matched have a advantage. We bucket the range of attack multipliers into 5 bins ranging from to , skew-symmetrize via . The result is the rightmost panel in Figure 5.

The coarse grained summary is strikingly similar to the provided attack matrix. The apparent structural parity in these two matrices highlights the virtues of PTA. Without any domain knowledge, access to attribute data, or any explicit instruction to identify clusters, PTA clustered Pokemon by their most relevant attributes (type) then encoded a game mechanism (type specific attack multipliers) directly from the cluster locations. Conversely, the second disc game shows how cyclic relations introduced at the mechanism level translate into realized cyclic relations in actual performance.

Coloring the disc games by “generation", i.e. pokemon release date, reveals design choices. The game is frequently updated by the addition of new Pokemon. Updates present a design challenge. Game designers must introduce desirable new Pokemon without upsetting the game balance. The fourth disc game, shown in the far right plot of Figure 4, is balanced in that rating does not predict phase, and instead correlates with radius. Strong and weak Pokemon are closer to the origin, while Pokemon of intermediate rating are more involved in the trade-off. This reveals a spinning top structure characteristic of many games [czarnecki2020real]. Rather, generation predicts phase. Each generation possesses an advantage over its predecessor, as illustrated by the fade from purple to yellow. Balance is retained since generational advantage is periodic. The same clockwise generation shift reappears in the second disc game. Within type, new beats old. For example, the bottom-most cluster (grass) clearly trends old to young. Cross type relations are largely unchanged.

Figure 5: Left: RPS sub-game discovery. Each cluster type is represented by a pokemon character of that type, Middle: Empirical performance matrix, Right: Performance matrix derived from [pokemon-attack-matrix]

6 Conclusion

Following Balduzzi (Balduzzi et al., 2018), we have demonstrated that all evaluation matrices admit an expansion onto a sum of disc game embeddings. We suggest the name PTA based on the close analogy with PCA. Through examples, we have demonstrated that embeddings produced by PTA can reveal a surprising variety of competitive structures from outcome data alone. Without prior knowledge of Pokemon, PTA was able to reveal tradeoffs in the game arising from speed, type, and generation attributes. Game design choices related to both type and generation were discovered in a way that could have been done without aprior knowledge of their existense. Likewise, without any knowledge of the game rules, or win conditions, PTA identifies symmetries, and win condition trade-offs in Blotto. These methods are quite general and can be applied to any 2 player constant sum game, or any decision problem involving pairwise choice. Future work could expand on the class of games and provide more general methods for finding embeddings, such as a functional theory connecting performance with attribute space.

Appendix A Appendix

Appendix B Principal Trade-off Analysis

b.1 Schur Decomposition is a Sum of Disc Games

Here we prove that the Schur decomposition (real Schur form), is equivalent to a sum of disc games applied to the embedding maps .

Recall the embedding construction. Given a skew symmetric matrix , write where is real, orthonormal, is block diagonal with diagonal blocks , and is the two by two ninety degree rotation matrix. Let . Then, the rank approximation to is:

(5)

Recalling the embedding construction, write:

(6)

Thus, restricted to each planar embedding is a disc game and the optimal rank approximation of is a linear combination of disc games applied to the sequence of planar embeddings .

b.2 PTA and Fourier Series

Both the and the blotto examples exhibit strikingly modal disc games that repeat at increasing frequency, and on smaller spatial scales, with increasing disc game number. These patterns suggest an analogy to Fourier series. To make the analogy more concrete we present one last example.

Consider blotto. Since the net value of the first two zones is less than the value of the fourth zone, a player wins the overall game if they they win the third zone, or, tie in the third zone and win the second. Otherwise they tie in all zones or lose. Thus, the performance function where is the indicator function for the event in the subscript. Performance can be reduced to a comparison of a single agglomerated trait, . Then . Thus, performance is a step function applied to the difference . The difference is dominated by the difference in allocations to the third zone.

Figure 6 shows the first nine disc games. Notice that all allocations are embedded onto circles, or, in the first case, half circles. Moreover, phase (position along each circle), is entirely a function of the agglomerated trait . This form is apparent in the phase panels, where phase is close to constant for fixed allocation to zone 3, but is tilted slightly to account for allocation to zone 2. The first disc game is transitive and completes one half circle moving clockwise from zero allocation to zone 3 to exclusive allocation to zone 3. Discs games 2 and 3 complete 1.5 and 2.5 rotations each when moving from zero allocation to zone 3 to exclusive allocation to zone 4. The pattern continues for the first 9 disc games. Moreover, the circle radii decay geometrically (as shown by the sequence of gold scatter points in the last panel of Figure 7).

Figure 6: Top row: The first 9 disc games for blotto, colored by rating. Bottom row: Phase as a function of allocation in each disc game. The top corner of the triangle corresponds to exclusive allocation to zone 3. Note that phase allocation increases in frequency with increasing disc game. Colors denote phase, with advantage flowing clockwise. The eigenvalues associated with each disc game are provided, along with the percent of recovered by the running sum of the disc games. Note that the first disc game is responsible for 90% of the structure of , and the first two disc games account for 95% of its structure.

These features are hallmarks of a sine series embedding. We show below that any translationally invariant performance function of a single attribute can be represented by a sum of disc games, where the embedding into each disc game maps to a circle, the attribute maps to a phase coordinate around the circle, and the radii of the circles in each disc game are controlled by the coefficients of a sine series expansion. The performance function only depends on the difference in allocations, , so is translationally invariant, and is a function of a single trait, . Thus, it admits a sine series expansion. Note, the subsequent analysis does not guarantee low rank optimality, so only shows that disc game embedding via sine series is possible, not that PTA will necessarily produce such an embedding.

Consider a performance function of the form:

(7)

for , for arbitrary amplitude , and frequency .

Performance functions of this form are easy to embed, since the disc game uses a cross product. The cross product between two points on a plane, expressed in polar coordinates, is the product of their radii times the sine of the difference in their phases. Therefore, if:

(8)

then:

(9)

Lemma 1: [Trigonometric Performance Functions of One Trait] If for both in a one-dimensional trait space, then is disc game embeddable using the embedding:

.

Notice that this construction maps the intervals of length in to the circle of radius centered at the origin. It follows that, if a performance function is embeddable onto a circle centered at the origin then there exists a mapping from trait space to the real line where performance is of the form 7.

This result extends easily to linear combinations of sinusoidal functions with varying frequencies. Consider a performance function of the form:

(10)

Then, can be recovered using a sum of disc game embeddings, where the embedding has the form:

(11)

Note that, all the performance functions of this kind are translation invariant since they are functions of the difference , which does not change if after shifting and by some amount . :

Theorem 1: [Translation Invariant One Trait Performance Functions] Suppose that is a one-dimensional trait space, and is translation invariant. Then there exists a function such that . Suppose in addition that is periodic with period , or is contained inside an interval with length . Then is disc game embeddable using a countably infinite sequence of disc games, which correspond to the sine series expansion of and converge under the same conditions as the sine series. Moreover, each disc game represents a term in the sine series, and maps to a subset of a circle centered at the origin with radius fixed by the corresponding coefficient in the sine series.

Proof: If is translation invariant then for some function . Since , must be an odd function. If is contained inside an interval of length , then can be extended to an odd, continuous, periodic function, or an odd periodic function. If not, then, by assumption, is periodic with period .

All integrable periodic functions on the real line can be approximated with a Fourier series. If the function is real valued and odd, then the Fourier series is a sine series of the form:

(12)

Each term in the sine series can be reproduced by a disc game embedding using the method for embedding sinusoidal functions introduced before. Specifically, let:

(13)

where is the amplitude in the sine series of :

(14)

Then, a partial expansion in terms of disc games equals the term sine series expansion of :

(15)

Thus, convergence of the sequence of disc game embeddings follows convergence of the sine series expansion.

It remains to show that, after sampling a finite set of agents, the result of PTA recovers the sine series representation. Sine series are low rank optimal in this case since, if the agents are ordered by increasing , the evaluation matrix is of the form:

(16)

This matrix is real, skew-symmetric, Toeplitz, and is diagonalized by the discrete Fourier transform, so PTA produces a sine series. The sequence of disc games act as the sine series expansion of a step function, with each higher order disc game corresponding to a higher order correction of an approximation to the step function. The first disc game is transitive and captures 90% of the structure of the evaluation matrix. Subsequent disc games correct the first disc game in order to produce a step function. While it only takes two disc games to recover 95% of the structure of

, so the 95% complexity of blotto is 2, the subsequent eigenvalues decay slowly, so stricter accuracy requirements lead to large complexities. The slow decay of the eigenvalues is a natural consequence of the slow convergence of sine series to a step function. Here it is clear that the complexity predicted by PTA overstates the complexity of the underlying game, since subsequent disc games are best interpreted as corrections that gradually finesse the first disc game, not distinct trade-offs.

More general proof and exploration is saved for future work.

Appendix C Disc Games

c.1 Geometry

Principal trade-off analysis is a useful visualization technique since disc games encode performance relations via embedding geometry. Reading a disc game requires familiarity with this geometry. Namely, familiarity with the various interpretations of a cross product. Here we review some relevant relations.

Cross products are closely related to area in the embedding. Given a pair of competitors with embedding coordinates and , the performance of against in disc game , , equals the signed area of the triangle with vertices , , and the origin. Further, the degree of cyclicity on a loop of competitors can be computed by evaluating a path sum of the advantages around the loop, i.e.  [HHD]. The curl on loop equals the signed area of the loop traced out in each embedding [strang2022quad], summed over the embeddings. It follows that curl inherits the invariances of areas. In particular, curl is translation invariant.

The cyclic component on a given edge equals the average curl over all possible triangles formed by drawing a random . Since curl is translation invariant, the cyclic component of competition is translation invariant. In contrast, the transitive component of competition is not translation invariant, and translation in a disc game induces a transitive component of competition [balduzzi2018re]. By subtracting from to recover we center all the rows and columns, so the scatter cloud of embedded competitors will be centered at the origin. In contrast, if we embed directly, then transitivity arises from translation of each scatter cloud away from the origin. If the origin is not included in the convex hull of the embedded agents, then competition is transitive.

In contrast, scaling the embedding coordinates does change the predicted performance relations. Area is proportional to length squared, so scaling the embedding coordinates by scales the associated cyclic component of competition by . The scaling from to , was adopted to ensure that unit area in embedding generates unit curl. It follows that the area encompassed by a set of points in a disc game embedding directly represents the amount of cyclic competition among those agents, and hence the importance of that embedding.

Appendix D Blotto

d.1 Intransitive and complexity analysis

Figure 7: For a variety different blotto games we have Left: vs allocation (troops) Middle: complexity vs where complexity is the number of disc games it takes to reach Frobenius norm error rate, Right: For each game we list the eigenvalues in decreasing order using log scale

Here we look at how PTA can be useful for comparing different blotto games. New blotto games are easily created by varying the game parameters. The number of zones, troops, and payouts can be changed to generate games with varying structure. Other Blotto variants, such as Boolean Blotto, or Colonel Lotto, are not considered here. We focus on a small number of zones — or — to illustrate how the intransitivity () and complexity change for varying and across a number of different payout structures.

We propose three hypotheses. First, while the number of distinct strategies grows combinatorially in and , the allotment problem converges, in the limit of large , to a continuous problem in which each commander can allot an arbitrary fraction of their total force to any zone. Therefore, our structural measures should converge to finite values representing the structure of a Blotto game allowing any fractional unit allotment on the interval . Both measures should increase with increasing towards the limiting case, as the space of available allotment strategies grows with . Second, complexity should increase with increasing , since the number of distinct strategic trade-offs should increase as the number of distinct, exploitable, win conditions increases. Third, small changes in game structure can lead to large differences in both intransitivity and complexity. Small changes can break underlying symmetries, and, the majority functions that determine performance are discontinuous in nearby allocations so small changes in battlefield weights can produce sudden changes in the set of win conditions.

To test these hypotheses, we vary between 5 and 45 while holding fixed at 3 and 4. We consider three distinct payout structures for K=3 and 2 for K=4. For each game we construct and compute the intransitivity using the HHD. The complexity of each game is computed by finding the number of disc games required to reach an error tolerance of (measured in relative Frobenius error). All of the games considered are highly cyclic, with in all but the case. The case is transitive case, since the only win condition is victory in. Note that is not zero for the case since the chosen performance measure cannot be expressed as a linear function of a difference in ratings, so is not perfectly transitive. Nevertheless, if the step function used to assign battlefield outcomes is replaced with any sigmoid , then the case is perfectly transitive with respect to , where rating equals allotment to zone 4.

Figure 7 shows the results, and confirms our three hypotheses. First, the intransitivity and complexity for all games plateaus for large enough N. This indicates that, beyond a certain , the game reaches its “strategic capacity"; all meaningful types of trade-offs have been expressed. In both cases, the limiting complexity is strikingly small relative to the size of the strategy space, which grows rapidly in . For a game with zones and units, there are distinct strategies to consider. In the most extreme case tested, and , so the strategy space contains distinct allotments, which can be reduced to distinct trade-offs. Thus PTA can effectively separate the underlying complexity of a game from the size of its strategy space.

Second, in the right side of Figure 7 the complexity of the =4 games are much higher than the =3 games.

Third, the difference in complexity from one game to the next in stark. At = 45 for example there are about 15 additional disc games needed to reach 95 accuracy when going from the [1,2,4] transitive game to the uniform [1,1,1] game. Furthermore having similar does not imply similar complexity. The [1,1,1] and [2,3,4] games have similar intransitivity but have a difference of  20 in complexity.

Figure 8: Left: Disc game 3 colored by rating. Middle: Disc game 3 colored by speed. Right: Evaluation matrices generated by disc games 1 (top) and 3 (bottom), with agents ordered by increasing speed. Both evaluation matrices are close to Toeplitz, so produce evaluation that depend primarily on the difference in speed between agents. The function that returns performance given a speed difference is approximated by sampling the evalutation matrix along the cross-diagonal marked in white. The subpanels containing scatter plots show the sampled evaluations.

Appendix E Pokemon

Here we describe in further detail the construction of the Pokemon data. In the full game a player (or trainer) captures Pokemon to compete against the Pokemon of other players, usually with teams of 6 Pokemon chosen at the player’s discretion. Players also choose the order in which their Pokemon compete since the actual combat is done pairwise. This pairwise interaction is what allowed us to ignore the team aspect and still learn important aspects of the game. Each of the pokemon had a set of attributes. The attributes are shown below.


  1. Type 1: Main Type - Fire, Water, Grass, ect…

  2. Type 2: Secondary Type - Not all pokemon have two types but we did not find this to contribute to any performance tradeoffs in a significant way

  3. HP: Hit points - Indicated how much damage a pokemon can endure before losing the match.

  4. Attack: Base modifier for normal attacks

  5. Defense: The base damage resistance against normal attacks

  6. Special attack

  7. Special Defense

  8. Speed: This stat largely determines which pokemon get to attack first. As combat is turn based, this constitues a large advantage which we saw in disc game 1.

There were 50,000 pairwise interactions among the 735 pokemon that were used. The data for each interaction consisted of the name of the first and second pokemon as well as the winner of the match. In an individual matchup, each Pokemon has a certain level of HP or health. The two Pokemon take turns attacking one another until one of the them loses all of their HP and is declared the loser. The first to attack is determined by some set of attributes that is not explicitly given by the data set, but speed is known to be a large contributing factor. Since we did not have the full interaction graphWe filled in any missing data using logistic regression, producing a win probability matrix. We obtained the evaluation matrix via the logistic link function commonly used in Elo rating. The evaluation for competitor vs competitor is then given by where is the probability that Pokemon beats Pokemon . We applied the Schur decomposition directly to to show that disc game embedding can successfully isolate a dominant transitive component (speed).

In Figure 8 we show the third disc game left out of the main analysis. It shows a double loop structure with a full inner circle and a half outer circle. Like disc game 1, disc game 3 is, approximately, a curve parameterized by speed. As in the Fourier examples discussed before, the double loop represents a higher order correction to disc game 1. Disc game 1 confers a transitive, monotonically increasing advantage to faster agents. The faster an agent relative to their opponent, the larger their advantage. Disc game 3 adds nuance to this relation by discounting the advantage conferred by small differences in speed, increasing the advantage conferred by intermediate differences in speed, discounting the advantage conferred by large speed differences, and strongly rewarding maximal speed differences (see the evaluation matrix and associated subpanel in the rightmost column of Figure 8).

Note, these corrections to disc game 1 are very small. The eigenvalue for disc game 1 is roughly 15 times larger than the eigenvalue for disc game 3, hence the relationship between speed and performance is largely determined by disc game 1. We did not discuss disc game 3 in the main text for this reason.

References