Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games
We establish that algorithmic experiments in zero-sum games "fail miserably" to confirm the unique, sharp prediction of maxmin equilibration. Contradicting nearly a century of economic thought that treats zero-sum games nearly axiomatically as the exemplar symbol of economic stability, we prove that no meaningful prediction can be made about the day-to-day behavior of online learning dynamics in zero-sum games. Concretely, Multiplicative Weights Updates (MWU) with constant step-size is Lyapunov chaotic in the dual (payoff) space. Simply put, let's assume that an observer asks the agents playing Matching-Pennies whether they prefer Heads or Tails (and by how much in terms of aggregate payoff so far). The range of possible answers consistent with any arbitrary small set of initial conditions blows up exponentially with time everywhere in the payoff space. This result is robust both algorithmically as well as game theoretically: 1) Algorithmic robustness: Chaos is robust to agents using any Follow-the-Regularized-Leader (FTRL) algorithms (e.g., gradient descent), the well known regret-minimizing dynamics, even when agents mix-and-match dynamics, use different or even slowly decreasing step-sizes. 2) Game theoretic robustness: Chaos is robust to all affine variants of zero-sum games (strictly competitive games), network variants with arbitrary large number of agents and even to competitive settings beyond these. Our result is in stark contrast with the time-average convergence of online learning to (approximate) Nash equilibrium, a result widely reported as "(weak) convergence to equilibrium".
READ FULL TEXT