# Learning Dynamics and the Co-Evolution of Competing Sexual Species

We analyze a stylized model of co-evolution between any two purely competing species (e.g., host and parasite), both sexually reproducing. Similarly to a recent model of Livnat evolfocs14 the fitness of an individual depends on whether the truth assignments on n variables that reproduce through recombination satisfy a particular Boolean function. Whereas in the original model a satisfying assignment always confers a small evolutionary advantage, in our model the two species are in an evolutionary race with the parasite enjoying the advantage if the value of its Boolean function matches its host, and the host wishing to mismatch its parasite. Surprisingly, this model makes a simple and robust behavioral prediction. The typical system behavior is periodic. These cycles stay bounded away from the boundary and thus, learning-dynamics competition between sexual species can provide an explanation for genetic diversity. This explanation is due solely to the natural selection process. No mutations, environmental changes, etc., need be invoked. The game played at the gene level may have many Nash equilibria with widely diverse fitness levels. Nevertheless, sexual evolution leads to gene coordination that implements an optimal strategy, i.e., an optimal population mixture, at the species level. Namely, the play of the many "selfish genes" implements a time-averaged correlated equilibrium where the average fitness of each species is exactly equal to its value in the two species zero-sum competition. Our analysis combines tools from game theory, dynamical systems and Boolean functions to establish a novel class of conservative dynamical systems.

## Authors

• 58 publications
• 7 publications
10/26/2017

### Evolutionary games under incompetence

The adaptation process of a species to a new environment is a significan...
12/21/2020

### Evolving the Behavior of Machines: From Micro to Macroevolution

Evolution gave rise to creatures that are arguably more sophisticated th...
10/30/2017

### Rock-Paper-Scissors, Differential Games and Biological Diversity

We model a situation in which a collection of species derive their fitne...
02/03/2015

### A multiset model of multi-species evolution to solve big deceptive problems

This chapter presents SMuGA, an integration of symbiogenesis with the Mu...
09/03/2020

### Equal partners do better in defensive alliances

Cyclic dominance offers not just a way to maintain biodiversity, but als...
07/02/2014

### Novelty Search in Competitive Coevolution

One of the main motivations for the use of competitive coevolution syste...
07/03/2006

### Theory of sexes by Geodakian as it is advanced by Iskrin

In 1960s V.Geodakian proposed a theory that explains sexes as a mechanis...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

An exciting recent line of work in the theory of computation has focused on the algorithmic power of the evolutionary process (Valiant

[48], Livnat et al. [25, 24]). The latter two papers identified as interesting the case of a sexually reproducing, haploidal, and panmictic (explained below) species, evolving in a fixed environment according to variants of Multiplicative Weights Update dynamics [11, 12]—which are typically referred to as “replicator dynamics” in the evolutionary dynamics literature [50]. Curiously, however, Mehta et al. [27] made the discovery that these dynamics lead in the long run (in almost all cases) to a genetic monoculture. This rather contradicts the evidence of natural diversity around us.

Several plausible explanations exist for this discrepancy, including: (a) mutations [28], (b) speciation (e.g., the Bateson-Dobzhansky-Muller model) [19], (c) the mathematical assumptions are too far from reality, (d) “in the long run” is longer than geologic time. There is, however, a long-standing argument, that there is another (and perhaps more important) factor driving diversity; to our knowledge this case was first compellingly laid out by Ehrlich and Raven in 1964 [15]: “It is apparent that reciprocal selective responses have been greatly underrated as a factor in the origination of organic diversity.” (Already Darwin noted the significance of co-evolution, e.g., between orchids and moths that feed on their nectar; but the proposed implication for diversity seems to have come later.) In the ensuing decades this idea played a role in the Red Queen Hypothesis [47] and was advanced as an explanation of an advantage of sexual over asexual reproduction [6].

Apart from empirical study (e.g., [10, 46, 35, 8]), the dynamics of co-evolution have also been studied mathematically, but primarily (explicitly or implicitly) for asexual reproduction—dynamics in which the abundance of a genome changes over time in proportion to its fitness (possibly with mutations), as in the work of Eigen, Schuster and others [16, 17, 32, 44, 45]. The case of sexual reproduction, however, is quite different. There is a good mathematical model for these dynamics, called the “weak selection” model [31], but effects of co-evolution are not yet understood in this model.

We study a specific class of systems in this model, and provide a quantitative study of the evolutionary dynamics of sexual species in highly competitive (“zero sum”) interactions. This study supports the thesis of Ehrlich and Raven, that competition drives diversity, in a strong form: not only does a genetic monoculture not take over, but in fact the entropy of the species’ genomes is bounded away from for all time. Thus we support a rationale for ecosystem diversity without invoking mutation, speciation or environmental change.

A sexual species under weak evolutionary pressures (to be made precise) can be modeled, game-theoretically, as a team, whose players are the genes [25, 12]. A team in a multiplayer game [26, 40] is a set of players who share a common payoff but use independent randomness. In our setting we have two teams that compete against each other for survival. Learning dynamics in such games create dynamical systems entirely unlike those explored in the no-regret learning in games literature [9]. For example although there have been quite a few papers arguing about nonequilibrium limit cycles in game dynamics [13, 21, 23, 34, 33], these typically explore small games with a maximum of two or three players, each with two or three available strategies. Even in these settings the analysis is typically intricate and is based on case-by-case observations that do not generalize to large classes of games. In a handful of cases where non-equilibrium behavior is proven [37, 36, 30] in larger games, the non-equilibrium behavior is typically non-periodic, and thus to a large extent unpredictable. In contrast, we present a large parametric class of multi-agent games for which we prove periodicity from almost all initial conditions.

Class of Games: We establish our results in the restrictive setting of “Boolean phenotypes”. To explain, we will be studying a zero-sum competition between two species and . Organism has genes, and organism has genes. Each is haploid (possessing, for each gene, exactly one of the possible alleles), so that a genotype of species

is a vector

, and that of species is . In the Boolean-phenotype model there are Boolean-valued functions and a payoff matrix such that the result of an encounter between and is a payoff of (to , and minus this to ).111Like any multiplayer game, such team zero-sum games have Nash equilibria; but unlike the two-player zero-sum game which they resemble, the team game generally has a positive duality gap. (See [40] where this is worked out, and earlier [49] for the case of a team vs. a single player.) This gap creates an opportunity for very rich dynamics when each gene continually adjusts its allele frequencies to the competition. As compared with the full reach of evolutionary dynamics this is a limited setting, but there are two good reasons to examine it.

1. This already constitutes an extension to two competing and adapting learners, of the study of evolutionary learning of Boolean functions initiated in [25].

2. There are biological examples which approximately fit this assumption—not with respect to the entire phenotype, but w.r.t. that aspect of the phenotype which is critical to the two-species interaction. E.g., the length of a hummingbird’s beak vs. that of a flower; the size of a crab’s pincers vs. the shell thickness of its prey; the choice of protein coatings employed by a microbe, and the corresponding immune response.

Not every game is, of course, zero sum; e.g., above, the interaction may be favorable to the hummingbird and to the flowering plant. However, we are interested in the effects of competition, and competition, in its purest form, is zero-sum. There is good reason, however, to view this zero-sum interaction not as between two players but as between two teams, with the members of each team being the genes of the species (a point of view influentially advocated by Dawkins [14]). The fact that the genes are, as “agents”, pursuing their optimization independently of one another, is crucial to the dynamics of sexual evolution.

Dynamics: The most tractable model of sexual evolution is to have the population at all times be in a product distribution evolving according to the replicator equation. (The correspondence between this continuous dynamic and the discrete-time MWU was described in [22, 27, 12]; MWU is an ubiquitous meta-algorithm with numerous connections within the field of computer science [3]. In this paper we work directly in the replicator framework.) The replicator equation [42, 41], given below in Eq. 1, is among the basic tools in mathematical ecology, genetics and the mathematical theory of evolution. Replicator dynamics and MWU have been studied extensively in numerous classes of games. For example, in the case of potential games where the utilities of all agents are perfectly aligned, these learning processes are known to converge to Nash equilibria and in fact generically to pure (non-randomized) Nash [22]. On the contrary in zero-sum games [39, 1, 37] it is known that they can exhibit complex chaotic behavior, highly sensitive to initial conditions (butterfly effect).

In our setting, an exceedingly rare form of structure manifests. We identify a novel class of conservative systems, analogous to Hamiltonian dynamics (e.g. ideal pendulum). The conserved quantity does not have a meaning of energy. This “constant of the motion” is a new type of structure that has not been reported before that emerges from the combination of evolutionary dynamics and Boolean logic (Lemma 3.5). Possession of a conserved quantity is no reason to expect periodicity, as the dynamics are high dimensional.222 variables, minus degree of freedom for the constant of the motion. The orbits are shown to be periodic via a novel type of embedding argument: Each orbit can be projected to a, possibly different, planar (and thus periodic) conservative system without any loss of information. (Proposition 3.4 and Lemma 3.4).

Theorem 1: Given any two-team zero-sum game defined by two Boolean functions, so long as all equilibria/fixed points are isolated, then all but a zero measure set of initial conditions lie on periodic trajectories of the replicator dynamics. (Formal statement in Section 3, Theorem 3.5).

An immediate corollary of our theorem is that since all monocultures are fixed points of the dynamics, almost all initial conditions lie on trajectories that remain bounded away from the monocultures.

## 2 Preliminaries

#### Notation:

Vectors are in bold-face, and unless otherwise indicated are considered as column vectors. The transpose of is . The coordinate of is . is the vector derived by removing from . is the cardinality of set .

is the probability simplex with support set

, i.e., .

#### Zero-Sum Games among Species:

We have two species, and . ’s genome has genes, and ’s has genes. Both are haploids which means that each organism has one allele per gene. (This is the simplest possibility. Humans are diploid, and other numbers are possible.) The proportion of organisms of species that have allele in their -th gene is written . Similarly for the proportion of organisms that have allele in their -th gene is . Clearly, and for any or . In this paper we focus on the simplified setting that each gene of the host/parasite organism has two variants/alleles (i.e., allele and ). Thus, the genotype of an organism in (resp. ) is a Boolean vector of length (resp. ) and we will abbreviate notation by writing as . The vector expressing the composition of alleles in each population of organisms can be thought of as encoding a randomized strategy for each gene (the mixed strategy for gene in species ).

A more important simplifying assumption is that each genotype produces one of only two possible phenotypes (e.g., Rh blood factor , long vs. short beak, etc.). A phenotype in our context is simply an arbitrary boolean function on the genome; for we let be the phenotype of organism , and likewise for . The significance of this mapping is that organisms interact only through their phenotype. The payoff for this interaction is given by a utility (or fitness) function (,)=),)), where is a payoff matrix of dimension . When organisms interact, each gene in organism receives the same utility , while each gene in organism receives the same utility ,.

A natural example is as follows: is a parasite and is the host. If the outcomes of functions match, i.e., the “key” of the parasite matches the “lock” of the host, then the utility of the host is and the utility of the parasite is . Otherwise, the utilities are reversed. In this case, matrix is the Matching Pennies payoff matrix.

More generally, we allow for the zero-sum game defined by

 U=(abcd)

to be any zero sum game that which has a unique Nash equilibrium, which is fully mixed. (i.e., either or , or equivalently, the best response sequence cycles (clockwise/anti-clockwise) along the four outcomes.) 333Otherwise, (given any generic zero-sum game) this competition is trivial, since it can be solved via iterated elimination of strictly dominated strategies and convergence to equilibrium for the whole system follows from standard arguments. E.g., see [50]. Our analysis applies for all rescaled zero-sum that are not dominance solvable [20].

#### Replicator Dynamics and Weak Selection in the Evolution of Sexual Species

If is the distribution on genomes of species and is the distribution on genomes of species , write (the fitness of genome ) and . The replicator dynamics are that the rate of change of is . A few lines of calculation show that the resulting rate of change of (the fraction of the population having allele in gene ) is

 ˙xi=xi(1−xi)(ui0−ui1) and ˙xi=0 if xi∈{0,1} (1)

where being the fitness of allele of gene . Similarly for genes of species ,

 ˙yj=−yj(1−yj)(uj0−uj1) (2)

The work of Nagylaki in 1993 focused attention on study of these dynamics in the “weak selection” model. In a sexual species, panmictic mating (mating of individuals selected uniformly and independently) without selection pressures, leads over time to the genome distribution being a product distribution, that is, to . The weak selection model makes the approximation that selection is slow enough relative to reproduction that the genome may be considered at all times to be in a product distribution, with time-dependent marginals and . The prior work [12, 25, 27, 29] is entirely within this model and it will be our focus as well. In weak selection the equations 12 may be considered a complete description of the process rather than merely summary statistics.

What distinguishes our dynamics from prior work is that the fitness of a genotype is no longer a constant but depends on the composition of the population of the other species; the fitness of an allele of a particular gene depends on the compositions of both populations (except in that one gene).

#### Genes↔Agents, Species↔Team of Agents, Allele↔Strategy.

In terms of the analysis, it will be helpful to think of the biological setting in purely game theoretic terms. The immediate game theoretic analogue of this setting is to study competitions between two teams, and . The first team has agents, whereas the second has . Each agent has two strategies, strategies and and we denote by the probability with which he chooses strategy . Given a strategy outcome, all choices of agents in team , (resp. ) are used as input in the Boolean functions of each team (resp. ) and each team participates with its respective output in the zero-sum game . All agents in a team enjoy exactly the same utility, i.e., their team’s utility in game . We denote by respectively the expected utility of agent , the expected utility of agent given that he chooses and the expected utility of agent given that he chooses (where the randomness is over the product distribution over the mixed strategies of all agents).

#### Existence and Uniqueness of Global Solution

The theory of differential equations ensures (for more details see, e.g., Chapter 6 of [50] or [2]

) that replicator dynamics of a multi-player game from initial conditions (i.e., initial probability distributions)

, have a unique global solution ; furthermore that this solution is smooth as a function of time and initial conditions. We define a trajectory or orbit through an initial state as the image of the whole time axis under the solution mapping :

 Traj(\boldmathz0)={\boldmathz=Φ(% \boldmathz0,t) for some t∈R}

To ease notation, when keeping track of the initial condition is not critical, we write instead of . We write , (resp. ) to denote the current (product) mixed strategy profile of genes in species (resp. ) or (resp. ) to denote the mixed strategy of a specific gene.

## 3 Analysis

### 3.1 Overview of the Proof

A critical insight is that instead of tracking the true state of the system , and directly trying to argue about the system in its native state space, we will focus on the quantities and , i.e., the expected output of the Boolean functions of both teams of agents. We will denote these quantities as .444The expected output of the Boolean function of team , , is clearly a function of , however, sometimes to simplify notation we will just write when we wish to focus on the time dependency, or just . As it turns out, it will be convenient to think of the distribution as encoding a mixed strategy implemented in the zero-sum game .

In Section 3.2

, we identify and classify the equilibria (fixed points) of the dynamics. These equilibria can be grouped into two categories.

Nash fixed points are states in which the expected output of each Boolean function encodes the unique (fully mixed) Nash equilibrium of the zero-sum game . These are stationary due to the fact that no team can deviate and gain the upper hand on its opposing team. The second type of fixed points, are states where at least one of the two teams got “stuck”. Their opposing team (e.g., team ) is not (necessarily) implementing its minimax strategy but nevertheless no agent of team can influence the expected outcome of his team’s Boolean function via unilateral deviations. One such example is when team is implementing a XOR function and at least two agents choose between uniformly at random. We call these fixed points strange fixed points as they intuitively correspond to evolutionary flukes, where changes to no single gene can affect the composition of the species.

In Section 3.3, we focus on a single population and show that the corresponding vector field can be expressed as a product of a scalar “rate” term (that depends on the mixed strategy of the opposing species) and a vector that only depends on the team’s own behavior (Proposition 3.3). Thus, the trajectory that each team traverses depends only the Boolean function that it implements (e.g., ) and its own initial condition (e.g., ). Effectively, this trajectory corresponds to replicator dynamics in a common utility/potential game where the joint utility of each agent in any mixed strategy outcome is the expected output of the team’s Boolean function.

This connection to potential games becomes handy for several reasons. First, in Theorem 3.3, we prove that strange fixed points are indeed evolutionary flukes that can be ruled out under typical genericity conditions. Secondly, in Proposition 3.3, we can already establish that exhibit a natural “chasing” relationship, which is an early step towards proving periodicity. That is, if we interpret the expected output of each organism as a mixed strategy with which the organism participates in the zero-sum game then this mixed strategy will move in the direction that would have increased its expected payoff given the mixed strategy of the opposing species. This creates a “chasing” behavior with the directionality of the movement of each mixed strategy (i.e., increasing or decreasing) flipping when the opposing team’s strategy transitions through the unique (mixed) Nash equilibrium of the game.

Furthermore, in Section 3.4, we leverage the connections to potential games to formally prove that it suffices to keep track only of the quantities . As explained above, these quantities correspond to the potential function in a common utility/potential game where the joint utility of each agent in any mixed strategy outcome is the expected output of each team’s Boolean function. In such a potential game, along any nontrivial trajectory the common utility/potential is strictly increasing with time and thus given any initial condition there exists a bijective function between the time range over which the trajectory is defined, , and the range of potential values over this trajectory. Thus, given the initial condition of the team’s behavior and the current output of the team, , the current behavior of each member of the team is uniquely defined. In a sense, each trajectory can be embedded onto a two dimensional system since given the initial conditions of both teams as long as we keep track of each team’s expected output we can uniquely identify the exact state of the system.

In order to prove the system periodicity we need to establish connections to the theory of Hamiltonian systems. In Section 3.5 we show that the two dimensional system that couples together is effectively a conservative system that preserves an energy-like function. Specifically, up to team specific reparametrizations and change of variables, the dynamical system has the form

which is a standard Hamiltonian system with Hamiltonian function equal to . In this parameterization all trajectories are cycles centered at . In our case, this conserved quantity, or “constant of the motion”, is more elaborate; nevertheless, once we establish it exists, we can leverage standard tools from topology of dynamical systems (Appendix C), and establish that its orbits are periodic (Theorem 3.5). Putting everything in this section together, Theorem 1 (more formally given as Theorem 3.5) follows.

Finally, in Section 4 we investigate the game-theoretic properties of these periodic orbits. By periodicity, the time-averages of all involved quantities are well defined. In Theorem 4 we show that the time-average play over the strategy outcomes of the team zero-sum game is a correlated equilibrium of that game. Unlike zero-sum games, in team zero-sum games [40] (even Boolean team zero-sum games) their sets of (Nash) equilibria may include outcomes of widely varying utilities for each team. Nevertheless, in Theorem 2 (given more formally as 4), we establish that sexual evolution leads to gene coordination at the species level, in a time-averaged sense. Namely, time average of the output of each team is equal to the unique fully mixed Nash equilibrium strategy of that team in the zero-sum game . Furthermore, the average utility of each team (and hence all of its members) is exactly equal to its value in the two species zero-sum competition.

### 3.2 System Description & Fixed Points: The Nash, the Strange & the Partial

###### Lemma.

There exists a constant such that the replicator system equations reduce to:

 ˙xi=αxi(1−xi)(fi0−fi1)(g−q) for all agents i% in team A,
 ˙yj=−αyj(1−yj)(gj0−gj1)(f−p) for all agents j in team B

where , the unique fully mixed Nash equilibrium strategy in game and where , for .

The proof of Lemma 3.2 deferred to Appendix A. This reparametrization does not affect the shape of the trajectories. Hence, we assume wlog to be equal to .

Structure of fixed points. Amongst all equilibria with full support, there exist two different types of fixed points. The first type corresponds to outcomes where and , i.e., outcomes in which the expected output of each Boolean function encodes the unique (fully mixed) Nash equilibrium of the zero-sum game. We call these Nash fixed points. The second type of fixed points, we have either for all agents of team , or for all agents of team , or both. We call these fixed points, strange fixed points.555It may be the case that one fixed point satisfies the definition of both Nash fixed point as well as strange fixed point. E.g. when both teams implement the XOR function and play against each other in a Matching Pennies game then the uniformly mixed strategy is both a Nash fixed point and a strange fixed point. This non-exclusivity simplifies the exposition and thus we allow it. These are fixed points where at least one of the two teams got “stuck”. Their opposing team (e.g., team ) is not (necessarily) implementing its minimax strategy but nevertheless no agent of team can influence the expected outcome of his team’s Boolean function via unilateral deviations. One such example is when team is implementing a XOR function and at least two agents choose between uniformly at random.

Finally there exist fixed points in which some agents are using pure strategies (e.g., or ). We call these partial support fixed points. We can complete the categorization by defining as partial support Nash fixed points, (resp. partial support strange fixed points) those partial support fixed points with at least one randomizing agent such that when examining the subgame defined by those strategies played with positive probability then they encode a Nash (resp. strange) fixed point. Fixed points without any randomizing agents are called pure fixed points.

### 3.3 The Topology of the Trajectory of Team A is Independent of Team B

###### Proposition.

The trajectory of team is a subset of the trajectory of system

 ˙xi=xi(1−xi)(fi0−fi1) for all agents in team A,

with initial condition , which is independent of team . We call this system, team’s subsystem and we denote its solution by . Moreover, is a strict Lyapunov function in this system. That is, given any initial condition we have unless is a fixed point of .

###### Proof

The multiplicative term is common across all terms of the vector field corresponding to agents in team in lemma 3.2. Hence, it dictates the magnitude of the vector field (the speed of the motion), but does not affect directionality other than moving backwards or forwards along the same trajectory. Specifically, both systems and have exactly the same orbits (but traverse them in opposite direction). So, the trajectory of team in our original system corresponds to a subset of a specific orbit of subsystem . This specific orbit starts at the initial condition (-team’s initial condition in the original system).

Moreover, we will show that is a strict Lyapunov function for this projected system. Due to the multilinearity of on ’s: and therefore . Combining this with the definition of the vector field of subsystem we have that

 dfdt=∑i∂f∂xi˙xi=∑ixi(1−xi)(fi0−fi1)2

The summation is clearly nonnegative and it is only equal to zero at the fixed points of .

Connection to potential games. Team’s subsystem is equivalent to applying replicator dynamics to a partnership game (a game where all agents receive the same payoff/utility at each (mixed) outcome) where the common utility function at mixed strategy is equal to . A partnership game is a potential game with potential function equal to the common utility. The potential is a strictly increasing function along any non-trivial system trajectory and all initial conditions implying convergence to equilibria (see e.g., [22]). We will leverage this connection and our current understanding of replicator dynamics in potential games from [22] to argue that in system only a measure zero set of initial conditions may converge to strange fixed points.

###### Definition.

We call an initial condition of subsystem safe, if and only if, the orbit does not converge to a non-pure666non-pure = a fixed point with at least one randomizing agent. fixed point for . Analogous definitions apply for subsystem as well. We call an initial condition of system safe, if and only if, both and are safe in their respective subsystems.

By proposition 3.3, given a safe initial condition , its respective orbit clearly cannot converge to a partial strange fixed point for . Indeed, a necessary condition for convergence to a partial strange fixed point in given initial condition as is that either or converge to a non-pure fixed point as .

###### Theorem.

If the fixed points of are isolated then all but a measure set of its initial conditions are safe.

The proof of theorem 3.3 is deferred to appendix B. At this point we can show that the expected outputs of the two teams when viewed as mixed strategies (i.e., probability distributions) , in the zero-sum game are being updated in a “rational” way. Specifically, when they are updated according to the system equations they will develop a “chasing” behavior where each mixed strategy will move towards the direction that myopically increases its expected payoff in the zero-sum game. This statement in itself does not suffice to argue periodicity as one can easily create trajectories (not of our dynamics of course) that spiral towards the fully mixed Nash equilibrium or diverge to the boundary while displaying this chasing behavior.

###### Proposition.

Unless no agent of team can influence the output of via unilateral deviations, the expected output of team’s Boolean function, , will increase (decrease) if and only if the output of team is larger (smaller) than ( the probability of choosing the first action in its unique Nash equilibrium of the zero-sum game). Similarly, will decrease (increase) when the output of team is larger (smaller) than ( the probability of choosing the first action in its unique Nash equilibrium of the zero-sum game).

###### Proof

Due to the multilinearity of on ’s: and therefore . Combining this with the vector field form presented in lemma 3.2 we have that

 dfdt=∑i∂f∂xi˙xi=(g−q)∑ixi(1−xi)(fi1−fi0)2

The summation is clearly nonnegative. In fact, it is only equal to zero at the fixed points of team’s subsystem.

From this point forward we will focus on safe initial conditions. This is a full measure set within the set of all initial conditions. We will prove that any such state is periodic, i.e., it lies on a closed orbit by establishing connections to planar Hamiltonian systems.

### 3.4 Reduction to 2-Dimensional Systems via Competing Lyapunov Functions

The next proposition states that knowledge of the initial condition as well as of the evolving values of suffices (in principle) to recover the complete system state at any time .

###### Proposition.

Given a safe initial condition , of system as well that the values , there exist smooth functions such that resp. for all and for each agent of team resp. for each agent of team .

###### Proof

If (resp. ) is time invariant, then the problem is trivial. Suppose not. We will argue that given an initial condition for every agent of team , i.e., , its mixed strategy at time , as captured by is uniquely defined given . We know that the curve traced by the agents of the first team is defined by their initial conditions and the function that they implement. Specifically, it is contained in the trajectory of team’s subsystem. So, as long as we can uniquely pinpoint a state on subsystem’s trajectory , given and , then this must correspond to team’s state in the complete system . Moreover, in subsystem , unless we are at a fixed point. However, since is not time-invariant, is not a fixed point of subsystem . Finally, by the uniqueness of the system solutions, we cannot reach a fixed point in finite time, and hence for all times So as a function of time in subsystem is always increasing, and it is smooth since is a composition of smooth functions. Thus, by the inverse function theorem (see Appendix C) exists777The inverse function of is effectively parametrized by and that’s why we write ., is smooth, and is strictly increasing. Thus, it is bijective between its domain, , and and given an input in its domain returns , the unique time instance at which the Lyapunov function in subsystem attains value given initial condition . Thus, is well defined given alone. We define as the projection function that given a vector returns its -th element, i.e., . Putting everything together, indeeds recovers the accurate state of agent in team given the current value of in system , i.e., , since it lies in . Finally, we continuously extend to .

We return to the study of the two team system and argue about the periodicity of its orbits . Due to the existence of functions , mapping to a unique the periodicity of extends to the system trajectories of . To simplify notation will we write from now on instead of but the dependency on , should be kept in mind.

###### Definition.

-planar dynamical system: We define the following class of planar dynamical systems on parametrized by a point and two smooth functions defined on with that are strictly positive in . Given such we define a -planar dynamical system as follows:

 dξdt = r(ξ)(ζ−q) dζdt = −w(ζ)(ξ−p)

The existence, uniqueness and smoothness of global solutions in the case of -planar dynamical systems follows from standard arguments, since the compact region is invariant and the vector field is smooth. (see e.g., [2])

###### Lemma.

Given a safe initial condition, of there exists a -planar dynamical system such that if then , for all .

###### Proof

For our two team systems we have that

 dfdt=∑i∂f∂xi˙xi=(g−q)∑ixi(1−xi)(fi0−fi1)2 = (g−q)∑iXi(f(t))(1−Xi(f(t)))(E\boldmaths−i∼X−i(f(t))f(0,\boldmaths−i)−E\boldmaths−i∼X−i(f(t))f(1,\boldmaths−i))2r(f)

where is a smooth function and is clearly nonnegative since . If , i.e., , since corresponds to a product distribution and is a Boolean function we have that for each either or , or the value of over all outcomes in the support of are equal to . Hence, . Thus, in all cases if then . A similar argument can be applied if . Finally, we will argue that since is safe then is strictly positive for . It suffices to show that for any , . Since there must exist some randomizing agents in distribution . Amongst these agents, there must exist an agent , , since otherwise the state where each agent of team plays would be a non-pure fixed point of subsystem contradicting the assumption that the initial condition is safe. The argument for , i.e., team follows along the same lines as for team .

### 3.5 Constants of Motion, Hamiltonian Systems, and Periodicity

Finally, we establish that the two dimensional system that couples together is effectively a conservative system that preserves an energy-like function. It is easy to check that if in the definition of our -planar dynamical system we set the functions being everywhere equal to then after a change of variables , the dynamical system has the form: , which is a prototypical Hamiltonian system with Hamiltonian function equal to . All of its trajectories are cycles centered at .

The conserved quantity, i.e., the “constant of the motion”, in our case is described in the following lemma and leveraging it we will establish that the system trajectories are periodic.

###### Lemma.

The quantity

 H(ξ,ζ)=∫ξpz−pr(z)dz+∫ζqz−qw(z)dz

is a constant of the motion (first integral) of the -planar dynamical system, i.e., it is time-invariant given any initial condition.

###### Proof

By applying the chain rule on

we have:

###### Theorem.

Every safe initial condition lies on a periodic orbit of .

###### Proof

From lemma 3.4 the system corresponds to a - planar system. If is a Nash fixed point of then it is trivially a periodic point. Suppose is not a Nash fixed point, then either or (or both). In all cases and due to lemma 3.5 the trajectory of the planar system stays bounded away from its unique interior equilibrium , since . Moreover, the gradient of at is equal to and thus we can create a trapping/invariant region . By the Poincaré-Bendixson theorem and since the trapping (invariant) regions does not contain any fixed points the -limit set of the trajectory is a periodic orbit. Since the gradient of is only equal to at , is a regular value of . By the regular value theorem is a manifold of dimension . The union of the trajectory starting at , along with its limit sets, is a closed, connected 1-manifold and thus it is isomorphic to (see Appendix C).

At this point, we are ready to piece together all our structural characterizations of the system trajectories to derive our first main theorem:

###### Theorem.

Given any two-team zero-sum game defined by two Boolean functions, so long as all equilibria/fixed points are isolated, then all but a zero measure set of initial conditions lie on periodic trajectories of the replicator dynamics

###### Proof

By theorem 3.3 since all the system equilibria are isolated, then all but a measure zero set of initial conditions are safe. By lemma 3.4 and 3.5, the projection of these trajectories on the space of outputs of each of the two teams, i.e. on the space is periodic. Finally, by 3.4 given a safe initial condition , of system as well that the values , there exist smooth functions such that resp. for all and for each agent of team resp. for each agent of team , thus the periodicity of translates to a periodic orbit on the space of system behaviors, i.e. on and the proof is complete.

## 4 Time Averages, Connections to Equilibria and Utility

Next, we will show that the time-average of the periodic trajectories of replicator dynamics satisfies some interesting game theoretic properties. In order to discuss these properties, it is useful to provide a reminder on some of the most basic solution concepts in game theory.

We give the definition of a correlated equilibrium, from [5].

###### Definition.

A correlated equilibrium (CE) is a distribution over the set of action profiles such that for all player and strategies ,

 ∑s−i∈S−iui(si,s−i)π(si,s−i)≥∑s−i∈S−iui(s′i,s−i)π(si,s−i)

We will also make use of the coarse correlated equilibrium ([51]), which is exactly the set of distribution that no-regret algorithms converge to. This convergence is only set-wise, i.e., distance of the time average behavior of no-regret dynamics and the set of CCE converges to zero, however, the time-average play may never converge to a specific CCE.

###### Definition.

A coarse correlated equilibrium (CCE) is a distribution over the set of action profiles such that for all player and strategy ,

 ∑s∈Sui(s)π(s)≥∑s−i∈S−iui(si,s−i)πi(s−i)

where is the marginal distribution of with respect to .

First, we will show that the time-average distribution over the space of strategy outcomes over any periodic orbit is a coarse correlated equilibrium. Furthermore, the time-average of the output of each team corresponds to the unique Nash equilibrium of the zero-sum game. Finally, the expected utilities of all agents correspond to the value of each of their respective teams in their zero-sum game. In effect, the sexual replicator dynamics enable the agents of each team to collaborate with each other so as to optimally solve the zero-sum game against the opposing team. Figure LABEL:fig:test2 shows specific examples of periodic trajectories where time-averaging over them converges to the solution of a Matching Pennies game999This is a rescaled Matching Pennies game where all utilities are nonnegative and the value of the game is . between two teams.

###### Theorem.

Given any periodic orbit, the time average distribution of play over strategy outcomes converges point-wise to a specific correlated equilibrium.

###### Proof

The time average of play is well defined and converges to a unique distribution over the space of strategy outcomes. Since the trajectory is periodic and the interior of the state space is invariant, the trajectory stays bounded away from the boundary of the state space. However, in this case the time average of the trajectory of the replicator converges to a coarse correlated equilibrium. More specifically, we will show that if any individual agent deviates to any fixed strategy then the time average of his expected utility does not decrease.101010Interestingly, we will show that it does not decrease either. The time average any agent’s expected utility remains invariant given any deviation to a fixed strategy. The replicator equation is equivalent to as well as , where , i.e., the expected utility of agent when taking into account his randomized action as well.

Next, we will isolate the probability related terms on the LHS and all the utility related terms on the RHS and we will integrate over a time interval and divide by . I.e.,

 ∫T01xi˙xidtT=∫T0(ui0−^ui)dtT (3)
 (4)

However, by a simple change of variables we have that , . The LHS of equations 3, 4 converge to zero as . Moreover, since the trajectories are periodic (e.g., with period ), all the following limits exist , , . Thus, for any agent :