Backward induction for repeated games

04/19/2018 ∙ by Jules Hedges, et al. ∙ 0

We present a method of backward induction for computing approximate subgame perfect Nash equilibria of infinitely repeated games with discounted payoffs. This uses the selection monad transformer, combined with the searchable set monad viewed as a notion of 'topologically compact' nondeterminism, and a simple model of computable real numbers. This is the first application of Escardó and Oliva's theory of higher-order sequential games to games of imperfect information, in which (as well as its mathematical elegance) lazy evaluation does nontrivial work for us compared with a traditional game-theoretic analysis. Since a full theoretical understanding of this method is lacking (and appears to be very hard), we consider this an 'experimental' paper heavily inspired by theoretical ideas. We use the famous Iterated Prisoner's Dilemma as a worked example.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We present a method of backward induction for infinitely repeated games. Since this is impossible, we more precisely perform -backward induction, that is to say, we compute plays in which each player’s choice gives them an outcome -close to their optimal outcome, where is a small error bound. We do this by combining Escardó and Oliva’s interpretation of the product of selection functions as an unbounded generalisation of backward induction [9], together with the nondeterministic generalisation in [10] and a view of the searchable set monad [3] as a notion of ‘compact’ nondeterminism. Since a full theoretical understanding of this method is lacking (and appears to be very hard), we consider this an ‘experimental’ paper heavily inspired by theoretical ideas.

Backward induction is an algorithm for computing equilibria of games with sequential structure, known as early as Zermelo [22]. The essence of backward induction is counterfactual reasoning. Suppose two players sequentially choose from sets , with payoffs given by , where is the payoff of player given the choices . Player 1 reasoning as follows: suppose she chose , then player 2 would choose in order to maximise . Ignoring for now that there may be several such associated to each , this defines a function . (Accounting for different possible functions leads to games with multiple equilibria.) Player 1 then chooses in order to maximise .

Selection functions and the selection monad, introduced by Escardó and Oliva in [6] and based on earlier work of Escardó in constructive topology [3]

, present backward induction in such a way that it falls out of general considerations in category theory. Moreover this reveals the non-obvious fact, often directly contradicted in standard game theory texts, that backward induction can be generalised to

unbounded games which have arbitrarily long plays [9].

Both Zermelo’s original presentation and Escardó-Oliva’s generalisation are defined only for games of perfect information, in which all players are aware of all relevant information. However modern presentations of backward induction are defined for arbitrary games in extensive form: the additional step is that whenever a ‘simultaneous’ subgame (i.e. a nontrivial information set) is reached, it should be replaced by a Nash equilibrium computed by some other method (see for example [18, proposition 9.B.3]).

An important class of games that include both simultaneous and sequential aspects are the repeated games [17], including the famous iterated prisoner’s dilemma. In these games, a simple simultaneous ‘stage game’ is played repeatedly (usually infinitely), with players at each stage being able to observe the complete play of all previous stages but not the action of other players in the current stage, and total payoffs being given by a convergent infinite sum. The single-stage prisoner’s dilemma is a simple model of the breakdown of trust and cooperation: in the game’s unique Nash equilibrium (mutually stable choice of strategies) both players defect (betray each other) and receive less payoff than they could by cooperating. By an argument using backward induction, in any finite repetition of the prisoner’s dilemma, both players will also always defect. However the infinitely repeated prisoner’s dilemma also has cooperative equilibria such as tit-for-tat and grim trigger strategies, in which cooperation is enforced by the threat of later retribution. (These games and strategies will all be defined later.)

In this paper we demonstration the application of Escardó and Oliva’s theory to games of imperfect information, using the iterated prisoner’s dilemma as a worked example. We do this using the nondeterministic variant of the selection monad introduced in [10], defining a new operator called the sum of selection functions that nondeterministically chooses a Nash equilibrium of a finite simultaneous game by brute force search, based on [14]. Two further innovations are required:

  • Defining the infinite discounted sum of payoffs requires using a simple model of computable reals, rather than an approximation such as rationals or machine-precision doubles. (Using an approximation would amount to disregarding play after finitely many stages, which drastically changes the game, for example ruling out cooperation in the iterated prisoner’s dilemma.) Specialising to the iterated prisoner’s dilemma and using Haskell allows us to essentially trivialise this step using lazy streams of digits in some base. However, the fact that computable reals do not have decidable inequality is a major crux, and prevents us from doing the impossible. Because of this restriction, we compute only approximate equilibria.

  • If we represent the nondeterministic choice of Nash equilibrium using representations based on the list or continuation monads, we find empirically that the resulting algorithm fails to terminate when combined with unbounded backward induction. Instead we use searchable sets, which support unbounded backtracking. Viewing the searchable set monad as a notion of ‘compact’ nondeterminism is a small but novel change of perspective.

This paper represents work done by the author several years ago as a continuation of [11]. A good understanding of this method, and especially a game-theoretic understanding of the nondeterministic selection monad defined in [10], remains elusive. It should therefore be considered as an ‘experimental’ paper, presenting the method with intuitive justification but little theoretical analysis, which is currently ongoing work by the author with Joe Bolt and Philipp Zahn.

This paper switches between mathematical notation and Haskell. A full Haskell implementation can be found at http://www.cs.ox.ac.uk/people/julian.hedges/code/RepeatedGames.hs.

Overview

In sections 2 and 3 we introduce simultaneous, sequential and repeated games. In section 4 we define selection functions and implement them in Haskell. In section 5 and 6 we relate selection functions to sequential and simultaneous games respectively. Sections 7 and 8 concern searchable sets, and section 9 concerns computable reals. Finally section 10 puts all the pieces together, discusses the results and summarises problems for future work.

2 Simultaneous games

In this section and the next we will introduce enough game theory from scratch in order to understand our worked example, the iterated prisoner’s dilemma (IPD).

Informally, a game in the sense of game theory is determined by the following data:

  1. A set of players

  2. For each player, a set of choices available at one or more points in the game at which that player makes a choice

  3. For each of those choices, a determination of the information available when the choice is made; more precisely, a set of observations that the player could make before making the choice; this determines a set of strategies for the move, which are functions from observations to choices

  4. The ‘internal dynamics’ of the game, which means a determination of the observation that will be made by each player given the choices of previous players; this requires that the temporal sequence of choices made is well-ordered

  5. For each player, a real number called the payoff, determined by all choices made by all players (the play)

This is traditionally formalised using extensive form games, due to [21]. However, since we only need certain special cases in this paper we will instead make several more specific definitions guided by this general template. The extensive form represents a game via its tree of plays, with choices taking place at nodes and payoff awarded at leaves; the information available to players is represented by a partition of nodes into ‘information sets’ which cut across the tree structure. More information can be found in any game theory textbook, for example [18, 16].

Definition 1.

An -player normal form game is defined by sets of choices and payoff functions

for each player .

In a normal form, each player simultaneously makes a single choice from . The term ‘simultaneous’ means that each player has no information available when making their choice. This means that the set of possible observations is a singleton where is a dummy observation representing ‘nothing observed’, and so the strategy for player ’s (single) choice is nothing but an element of , the choice itself. (This is sometimes called a pure strategy, to distinguish it from a mixed strategy

which is a probability distribution over choices.)

Definition 2.

A Nash equilibrium of a normal form game is a tuple of choices with the property that for each player ,

where is the tuple defined by

The operator is defined by

for .

Informally, a Nash equilibrium is a tuple of strategies (called a strategy profile) with the property that no player can strictly increase their payoff by deviating unilaterally to a different strategy. In this sense a Nash equilibrium is a combination of strategies that is self-enforcing, or more exactly, non-self-defeating. A normal form game may have zero, one or many Nash equilibria.

Perhaps the most famous example of a normal form game is the prisoner’s dilemma. This is a 2-player game in which the set of choices of the two players are , where stands for cooperate and stands for defect. The payoffs are given as follows:

This game has exactly one Nash equilibrium, namely . Both players would receive a higher payoff if was played, but neither can trust the other to not ‘betray’ them by deviating back to . In this sense, the prisoner’s dilemma is a simple mathematical model of a breakdown of trust or cooperation.

3 Sequential and repeated games

The next class of games we consider are the sequential games. These are a simplification due to Escardó and Oliva of the games of perfect information, those games in which players can observe everything that happened in the past. They come in two variants: bounded and unbounded. We will present only the unbounded version, which is more general.

In a sequential game, the set of players is totally ordered, giving the order in which moves are made, and each player can observe the list of moves made by previous players. (In a general game of perfect information, the player making a choice and the set of choices available may depend on the previous choices.) We restrict to infinite games, in which the set of players is countably infinite, and the order-type is . (In particular, the game has a first player who observes nothing.) We further restrict to monomorphic games, in which all players make choices from the same set.

Definition 3.

An unbounded monomorphic sequential game is determined by a set of choices, together with a continuous function

A strategy profile of such a game is a function which, given a finite list of choices observed by player , gives the move of player .

In the mathematical parts of this paper we carefully distinguish finite lists from streams , although they are conflated in the Haskell code. We will not formalise the meaning of continuous in this paper, which requires some topology, but it roughly means that computing the output to finite precision requires only knowing a finite prefix of . Games with a discontinuous payoff function, such as the dollar auction, require other techniques [15] and can have more pathological behaviour.

Clearly, a strategy profile determines a stream of choices called its strategic play. More generally, given a strategy profile and a partial play , we define a play , called the strategic extension of by , by the course-of-values recursion

This is the play which begins with , and afterwards is played according to the strategies .

Definition 4.

A subgame perfect equilibrium of a sequential game is a strategy profile such that for all partial plays of length ,

where is the sequence obtained by extending with .

Next we turn our attention to repeated games, which have both a sequential and a simultaneous aspect. A repeated game comes from taking a simultaneous game and playing it infinitely often, summing the resulting payoffs. A good introduction to repeated games can be found in chapter 2 of [17].

A repeated game consists of a normal form stage game played infinitely often, where in each stage the players can observe the choices made by all players in all previous stages, but not the choices of other places in the current stage. If a player receives the infinite stream of payoffs from the stage games, their total payoff is defined to be the discounted sum

where is the discount factor. There are several ways to interpret the meaning of . It can be seen as a mere mathematical trick to make the total payoff converge, allowing us to avoid specifying explicit preferences on streams of payoffs. It can be viewed as a measure of the ‘impatience’ of the players, how much they prefer immediate utility to deferred utility, and alternatively can be viewed as the probability that the game terminates after each round, with the discounted sum representing the expected payoff.

Definition 5.

Given an -player stage game with move sets and payoff functions :

  • A set of plays of the repeated game is

  • The th player’s payoff function is given by

    where is a fixed discount factor

  • A strategy for player in the resulting repeated game is a function

As for sequential games, a choice of strategy for each player determines a strategic play. We also define a strategic extension operator for repeated games: Given a strategy profile and a partial play with stages, the stream is defined by

A strategy profile is called a subgame perfect equilibrium if for all partial plays of length , and all players ,

Here is the play where:

  • In the first rounds, is played

  • In the th round, player plays and all other players play according to

  • In rounds greater than , all players play according to

Consider a repeated form of the prisoner’s dilemma, with . Given a stream of plays, the payoffs are given respectively by

where are the payoff functions given in the previous section. One example of a subgame perfect equilibrium is to play the stage equilibrium in every stage irrespective of earlier play. If we only finitely repeat the prisoner’s dilemma (equivalent to modifying to use only a finite sum), this is the only subgame perfect equilibrium. However, in the infinitely repeated game there are many subgame perfect equilibria, some of which have plays in which is always played. Possible payoffs resulting from subgame perfect equilibria of infinitely repeated games are characterised by the folk theorems [17, chapter 3].

An example of a cooperative subgame perfect equilibrium is as follows. Player 1 plays the strategy tit for tat:

which initially cooperates, and otherwise copies the opponent’s previous move. Player 2 plays the strategy grim trigger:

which cooperates as long as the opponent cooperates, but defects for all time if the opponent defects. The strategic play of this subgame perfect equilibrium is in every stage.

4 Selection functions

In this section and the next we recall the theory of selection functions, which was developed mostly by Escardó and Oliva and can be found in many references including [3, 6, 7, 8, 10].

A selection function is a function of type . We write this type as . More generally, given a type constructor , a -selection function is a function of type , which we write as . In Haskell:

Ψnewtype SelT r t x = SelT {runSelT :: (x -> r) -> t x}

Ψinstance (Functor t) => Functor (SelT r t) where
Ψ  fmap f (SelT e) = SelT (\k -> fmap f (e (k . f)))

For example, working for a moment in set theory, the

operator over a set is a selection function of type , where is powerset and is the set of real numbers. This operator takes a function to the set

If is nonempty and finite then is nonempty and finite for every . Another example of a -selection function is , which takes every function to the set .

If is a strong monad and is a -algebra then can be given the structure of a strong monad [10]. (Since every type is an algebra of the identity monad, is a strong monad for every .) We begin by setting up a Haskell typeclass for -algebras (requiring the MultiParamTypeClasses and FlexibleInstances language extensions):

Ψclass Algebra t a where structure :: t a -> a
Ψinstance Algebra Identity a where structure = runIdentity
Ψinstance (Functor t, Algebra t x, Algebra t y) => Algebra t (x, y) where
Ψ  structure a = (structure (fmap fst a), structure (fmap snd a))

The definition of the monad structure on is as follows:

Ψinstance (Monad t, Algebra t r) => Monad (SelT r t) where
Ψ  return = SelT . const . return
Ψ  SelT e >>= f = SelT (\k -> let g x = runSelT (f x) k
Ψ                                 h x = structure (fmap k (g x))
Ψ                              in e h >>= g)

This is admittedly a hard definition to understand. A relatively gentle explanation, relating to the continuation monad, can be found in [12].

In order to compute with outcomes in an ordered type, we need to do a brute force search. To do this cleanly we define a type class for finite types, which have an exhaustive list of elements:

Ψclass (Eq x) => Finite x where exhaust :: [x]
Ψinstance (Finite x, Finite y) => Finite (x, y) where
Ψ  exhaust = [(x, y) | x <- exhaust, y <- exhaust]

Now we can define:

Ψargmax :: (Finite x, Ord r) => SelT r [] x
Ψargmax = SelT (\k -> [x | x <- exhaust, all (\x’ -> k x >= k x’) exhaust])

A useful fact about the selection monad is that it is contravariant in the outcome type: Given a function we obtain a monad morphism [13, section 1.1.8]. In Haskell:

Ψreindex :: (s -> r) -> SelT r t x -> SelT s t x
Ψreindex f (SelT e) = SelT (\k -> e (f . k))

(This should be contrasted with the continuation monad, which is not functorial in the outcome type.) In particular, reindexing by the th projection yields the selection function of the th player in an -player game, who optimises the th coordinate of the outcome and is indifferent about the others.

5 The product of selection functions

Let be a strong monad and a -algebra. As a strong monad, admits a binary monoidal product

In Haskell’s do-notation, this monoidal product operator is especially intuitive:

Ψotimes :: (Monad t) => t x -> t y -> t (x, y)
Ψotimes a b = do {x <- a; y <- b; return (x, y)}

We can also fold this operator across finite lists to give , and across streams to give . Both of these folds are implemented by the Haskell prelude function sequence :: (Monad t) => [t a] -> t [a]; we will return to the question of productiveness on streams (i.e. whether each element is computed in finite time) later.

When is the identity monad, the following fundamental theorem connects selection functions with game theory [6, 8, 9]:

Theorem 1.

Let be a monomorphic unbounded sequential game defined by the choice set and the continuous outcome function

For each let be a selection function such that for every . Then

is well-defined and is the strategic play of a subgame perfect equilibrium of .

In fact, this theorem has nothing to do with the operator: Escardó and Oliva define higher order sequential games whose definition involves selection functions, and prove that the product of selection functions computes plays of subgame perfect equilibria in this more general case.

This infinite product can be directly implemented in Haskell using sequence, producing a productive stream giving the play of a subgame perfect equilibrium [7].

We briefly digress to consider the (largely not understood) game-theoretic meaning of the monad where is the nonempty finite powerset monad. The monoidal product of this monad is used for proof-theoretic purposes in [10]. For simplicity we consider a finite sequential game with stages, given by the payoff function .

For nonempty finite , itself has the type . It is therefore reasonable to ask whether there is a choice of -algebra (affine semilattice) structure on such that the (nonempty finite) set

is the set of all strategic plays of subgame perfect equilibria. This does not appear to be the case, however. Characterising sets that can be defined this way in game-theoretic terms is ongoing work with Joe Bolt and Philipp Zahn.

6 The sum of selection functions

Previous work on the product of selection functions (for example [6, 7, 8]) has considered only games of perfect information, which means that whenever a player makes a choice, they have access to all relevant information about the choices of other players. More precisely, ‘having access’ means that their strategy is a function that can depend on this information. However, repeated games such as iterated prisoner’s dilemma are not games of perfect information, but rather have both simultaneous and sequential aspects. In this section we suggest a way to handle simultaneous choices in the selection function paradigm.

Games defined by explicit selection functions are called ‘higher order games’, by analogy to selection functions being higher order functions. In [14] a solution concept suitable for simultaneous higher order games was considered, called selection equilibrium.

Definition 6.

A 2-player higher order simultaneous game consists of the following data:

  • Sets , of choices for the two players

  • A set of outcomes and an outcome function

  • For each player a multi-valued selection function ,

A selection equilibrium is a pair such that and .

If , and then the definition of selection equilibrium reduces to ordinary Nash equilibrium. Note that here we do not assume that is a -algebra, so may not be a monad.

The definition of selection equilibria crucially relies on the selection functions being multi-valued. However, the product of selection functions is studied mainly for single-valued selection functions, and its generalisation to nondeterministic selection functions is poorly understood in game-theoretic terms. This barrier has prevented a unification of higher-order sequential and simultaneous games.

We propose the following definition, called the sum of selection functions, which returns the set of selection equilibria, analogously to the product of selection functions for subgame perfect equilibria. (The terminology ‘sum’ is based on an optimistic hope that it has a nice algebraic interaction with the product of selection functions, although this is left for future work.)

Definition 7.

The binary operator

is defined by

A problem with is that can be empty even if and are never empty. For example, if and is defined by

then . (This is the game matching pennies.) While it is often demanded that nondeterminism is represented by non-empty powerset for partly philosophical reasons (the empty set represents failure or partiality of a computation), in this paper we have a specific reason to be wary of it: In the next section we are going to represent nondeterminism using searchable sets, and the empty set is not searchable.

We will ignore this problem in this paper because it does not come up in the iterated prisoner’s dilemma example; in the next section when we define the function searchList we will make it throw an exception on the empty list, and we find in practice that no exception is thrown. (Dealing with this problem properly will require some more work, for example adding probabilistic strategies and relying on Nash’s theorem [20], or alternatively adding an explicit ‘failure’ strategy using an exception monad, with explicit strategic preferences defined over the exception using monad algebras.)

We then define the sum of selection functions by a brute force search over finite sets.

Ψoplus :: (Finite x, Finite y) => SelT r [] x -> SelT r [] y -> SelT r [] (x, y)
Ψoplus (SelT e) (SelT d) = SelT (\k -> [(x, y) | (x, y) <- exhaust,
Ψ  x ‘elem‘ (e (\x’ -> k (x’, y))),
Ψ  y ‘elem‘ (d (\y’ -> k (x, y’)))])

We now demonstrate that the sum of selection functions correctly computes the unique Nash equilibrium of the (single-stage) prisoner’s dilemma. We define our stage game, the prisoner’s dilemma, in Haskell:

Ψdata Move = C | D deriving (Show, Eq)
Ψinstance Finite Move where exhaust = [C, D]

Although we could define outcomes as integers or doubles, we will instead define a specific datatype, since we will be reusing it later as part of our representation of computable reals. We call it ‘quit’, short for quaternary digit

Ψdata Quit = Zero | One | Two | Three deriving (Show, Eq, Ord)

Note that the Ord instance derived by Haskell is .

Now the payoff function of prisoner’s dilemma is

Ψpd :: (Move, Move) -> (Quit, Quit)
Ψpd (C, C) = (Two, Two)
Ψpd (C, D) = (Zero, Three)
Ψpd (D, C) = (Three, Zero)
Ψpd (D, D) = (One, One)

The first and second players respectively choose a single move to maximise the first and second coordinate of the outcome, given by reindex fst argmax and reindex snd argmax. We take the sum of these selection functions, and apply it to the outcome function pd:

Ψ> runSelT (reindex fst argmax ‘oplus‘ reindex snd argmax) pd
Ψ==> [(D,D)]

7 Searchable sets

The property of the monad that makes it suitable for working with unbounded games is that it supports an infinite monoidal product . In particular, recall from section 5 that the Haskell Prelude function sequence :: (Monad m) => [m x] -> m [x], when specialised to the selection monad, is productive on infinite lists.

Monads for which sequence is productive on infinite lists in Haskell include the identity, state, IO, reader, writer and selection monads, and monad transformer stacks containing only these. Monads for which sequence is not productive include the Maybe, list and continuation monads, and monad transformer stacks containing any of these. These are empirical observations only: a theoretical characterisation of these monads is lacking. Such a characterisation would have to explain the large difference in behaviour between the seemingly similar types (selection monad) and (continuation monad).

In particular, the monad does not admit infinite products when is the list monad, which is the most straightforward and common representation of nondeterminism in Haskell, and the one we used in section 6. That is to say, if we simply apply unbounded backward induction at this point then the resulting function will fail to terminate. We must instead find a suitable alternative to the list monad to represent nondeterminism. We find this in the searchable set monad, which is the special case of the selection monad [3, 5].

Definition 8.

A subset of a type

is defined to be its characteristic function

. For an element , we write if is True. A subset is called searchable if there is a selection function with the following two properties:

  • for all predicates

  • For all , if there exists an element of satisfying then is such an element

If this is the case, we say that represents .

This is a constructive analogue of topological compactness. Notice however that the empty subset is never searchable, since the selection function must always return an element of the subset.

We set up a Haskell type for this special case:

Ψtype Searchable = SelT Bool Identity

Ψsearchable :: ((x -> Bool) -> x) -> Searchable x
Ψsearchable e = SelT (Identity . e)

Ψsearch :: Searchable x -> (x -> Bool) -> x
Ψsearch (SelT e) = runIdentity . e

Given any nonempty finite list of elements of a type , we can produce a searchable set containing only those elements by searching the list:

ΨsearchList :: [x] -> Searchable x
ΨsearchList [] = error "searchList: Empty list"
ΨsearchList xs = searchable (\p -> case Data.List.find p xs of
Ψ  Nothing -> head xs
Ψ  Just x  -> x)

Using searchList, we can immediately ‘promote’ a list-based nondeterministic selection function to a searchable set-based one:

Ψpromote :: SelT r [] x -> SelT r Searchable x
Ψpromote (SelT e) = SelT (searchList . e)

Searchable sets are closed under several constructions analogous to compact topological spaces, notably forward images using fmap and countable products using sequence. The latter is a constructive form of the countable Tychonoff theorem, and is used to produce ‘seemingly impossible functional programs’ that search the Cantor space (or [Bool]) in finite time [4].

Searchable sets admit decidable existential and universal quantification. Given a selection function for a searchable set and a predicate , we know that if any element of satisfies then is such an element. Therefore in order to check whether any element of satisfies it suffices to test whether satisfies . In Haskell:

Ψexists :: Searchable x -> (x -> Bool) -> Bool
Ψexists e p = p (search e p)

The universal quantifier is defined by de Morgan duality :

Ψforall :: Searchable x -> (x -> Bool) -> Bool
Ψforall e p = not (exists (not . p))

8 Compact nondeterminism

It is worth saying a few words on the view of the searchable set monad as a ‘notion of nondeterminism’. In Haskell it is possible to use the list monad to express a logic programming style of backtracking search. (The use of the nonempty powerset monad to represent nondeterminism is due to

[19], based on earlier work on powerdomains in domain theory.) For example, suppose we define a binary nondeterministic choice operator as follows:

Ψtype Choice a = [a]

Ψchoose :: a -> a -> [a]
Ψchoose x y = [x, y]

Now consider the following program:

Ψchoose2 :: Choice (Int, Int)
Ψchoose2 = do {x <- 0 ‘choose‘ 1; y <- 0 ‘choose‘ 1; return (x, y)}

This program searches through the cartesian product . Given a predicate implemented as a Haskell function p :: (Int, Int) -> Bool, we can use a function like Data.List.find to ‘resolve’ the nondeterminism to a deterministic search for a satisfying input of p.

The previous program can be rewritten using the function sequence, which performs a list of actions in a monad in order:

Ψchoose2’ :: Choice (Int, Int)
Ψchoose2’ = do {[x, y] <- sequence [0 ‘choose‘ 1, 0 ‘choose‘ 1]; return (x, y)}

This form suggests attempting hypercomputation with an infinite sequence of nondeterministic choices searching for a satisfying input to a predicate p :: [Int] -> Bool on streams:

ΨchooseInfinity :: Choice [Int]
ΨchooseInfinity = sequence (repeat (0 ‘choose‘ 1))

Mathematically, sequence for the list monad computes cartesian products, and so this program seems like it should enumerate the (set-theoretically uncountable) Cantor set . Unsurprisingly this fails and the program does not terminate, but Haskell is unable even to produce the obvious first element [0,0,0,…]:

Ψ> sequence (repeat (0 ‘choose‘ 1))
Ψ==> *** Exception: stack overflow

(Intriguingly, the stack overflow happens before the opening square bracket is printed.)

If we replace the list monad with the searchable set monad, however, we can do precisely this.

ΨchooseS :: a -> a -> Searchable a
ΨchooseS x y = searchList (choose x y)
Ψ-- or chooseS x y = searchable (\p -> if p x then x else y)

Ψchoose2’’ :: Searchable (Int, Int)
Ψchoose2’’ = do {x <- 0 ‘chooseS‘ 1; y <- 0 ‘chooseS‘ 1; return (x, y)}

Given a predicate p :: Int -> Int -> Bool, we resolve the nondeterministic choices to a deterministic search using runSearchable choose2’’ p, which will return a satisfying input if one exists. Furthermore, this now extends to infinite search:

ΨchooseInfinity’ :: Searchable [Int]
ΨchooseInfinity’ = sequence (repeat (0 ‘chooseS‘ 1))

This can also be written more suggestively using direct recursion:

ΨchooseInfinity’’ :: Searchable [Int]
ΨchooseInfinity’’ = do x <- 0 ‘chooseS‘ 1
Ψ                      xs <- chooseInfinity’’
Ψ                      return (x : xs)

Now given some p :: [Int] -> Bool, the stream runSearchable chooseInfinity’ p is productive and satisfies p if possible. This is a seemingly impossible functional program.

The reason that this is possible is that as a computable function, the output of p can only depend on a finite prefix of its input stream, and lazy evaluation only evaluates as much as is needed. However, the operational behaviour of the seemingly impossible functional programs is still not well understood.

9 Computable reals

Our stage game, the prisoner’s dilemma, has outcomes in , and the iterated game is further parameterised by a discount factor . For convenience we choose the the discount factor to be . This means that the discounted sum

can be represented as an infinite stream of digits in base-4, where the th digit is precisely . That is to say, the discounted sum is represented by the identity function on streams.

Recalling the type of quaternary digits we defined in section 6, we define a Haskell type of real numbers in the unit interval represented as streams of quaternary digits, or quit-streams:

Ψdata R = [Quit]

Due to our trick of choosing a base-4 representation, we can avoid needing to define arithmetic operations on infinite quit-streams in the specific example of the iterated prisoner’s dilemma. This is fortunate, because they cannot be defined in general. More generally, representations of real numbers based on infinite streams of digits (which are possibly the most naive or obvious representation, at least in a language such as Haskell with direct support for streams) support constructive topology, but not constructive arithmetic. There does exist a model of the real numbers that supports both arithmetic and topology constructively: the signed digit stream model [23, 2]. Using this would be necessary for a more general implementation.

The important piece of work we have to do is to define an approximate ordering on R. As is well known, computable real numbers do not admit a computable ordering. They do admit a semi-computable ordering that terminates except on a set of measure zero, namely to search along a pair of streams looking for a difference and simply run forever if the streams are equal. However we find in practice that the product of selection functions does indeed search this diagonal, and will fail to terminate if we do this.

At this point we fix a positive integer , written precision in Haskell, which is the number of significant quaternary digits. In practice, we can only search to in a reasonable amount of time, although might be possible with enough optimising and a fast CPU. (The runtime of the product of selection functions has not been theoretically characterised, but is likely to be very fast-growing and is possibly nonelementary, i.e. the runtime grows as a stack of exponentials of height .)

Ψprecision = 4 :: Int

A fact about digit stream representations is that they admit non-identity equalities, exemplified by the (in)famous fact that . Similarly, in the lazy base 4 representation One : repeat Zero denotes the same real number as Zero : repeat Three. This is a general fact about computable analysis (precisely, the fact is that the embedding of any model into the ‘true’ unit interval necessarily fails to be injective).

When dealing with repeated games we should take particular care about this, because it is crucial for cooperative equilibria such as tit-for-tat being subgame perfect that a finite payoff now can be exactly balanced by future payoffs. For example, ‘betraying’ your opponent in order to receive a payoff of now followed by receiving forever after in eternal punishment, should be considered equally desirable as receiving now followed by an infinite reward of in every future stage.

Motivated by this, we compare approximate equality by evaluating the first digits of our quit streams as doubles, and then comparing to order . This is reasonable since is small in practice.

Ψquit2Double :: Quit -> Double
Ψquit2Double x = case x of {Zero -> 0.0; One -> 1.0; Two -> 2.0; Three -> 3.0}

Ψreal2Double :: R -> Double
Ψreal2Double xs = sum (zipWith f xs [1 .. precision])
Ψ  where f x n = quit2Double x * 0.25^n

Ψgreater :: R -> R -> Bool
Ψgreater xs ys = real2Double xs > real2Double ys - 0.25^(precision - 1)

greater xs ys computes xs ys to precision , i.e. if xs is slightly smaller than ys but we must search further than digits to discover the fact, then greater xs ys returns True. (Note that the Haskell function zipWith, when presented with lists of different lengths, will truncate the longer list. Since xs is infinite, real2Double takes the first digits of it.)

Armed with the approximate ordering, we can now implement as a nondeterministic selection function, using Haskell’s lists monad as a basic representation of nondeterminism.

Ψeargmax :: (Finite a) => SelT Double [] a
Ψeargmax = SelT (\k -> [x | x <- exhaust, all (\x’ -> k x ‘greater‘ k x’) exhaust])

Since every function (on a finite set) has an attained maximum, eargmax always returns a nonempty list.

Furthermore, using quantifiers for searchable sets we can define an algebra which, given a selection function representing a searchable set , searches for an element of satisfying . (Apart from the use of approximate inequality, this exhibits the elementary real analysis fact that a compact set of reals contains its maximum.) In Haskell:

Ψinstance Algebra Searchable R where
Ψ  structure e = search e (\x -> forall e (\y -> x ‘greater‘ y))

Unfortunately, the use of approximate inequality means that this does not obey the axioms of a monad algebra. We proceed anyway since it appears to work in practice. If the reader is worried abut this, we could equivalently make the type of outcomes (or Searchable R), which is the free -algebra on , and move this use of the approximate ordering into the selection function (whose type becomes ). That is, instead of using the standard function directly, we generalise it to a ‘nondeterministic ’ that imposes its own ordering on compact sets of outcomes. Such variants of are considered in [11].

10 Putting it together

The payoff function of the iterated prisoner’s dilemma takes a stream of pairs of choices, and computes the discounted sum of payoffs according to pd. Due to our choice of representation, this is trivial:

Ψipd :: [(Move, Move)] -> (R, R)
Ψipd ms = (map (fst . pd) ms, map (snd . pd) ms)

We demonstrate that applying the product of selection functions directly over the list monad does not terminate:

Ψstage :: SelT (R, R) [] (Move, Move)
Ψstage = reindex fst eargmax ‘oplus‘ reindex snd eargmax

Ψ> :type runSelT (sequence (repeat stage)) ipd
Ψ==> runSelT (sequence (repeat stage)) ipd :: [[(Move, Move)]]
Ψ> runSelT (sequence (repeat stage)) ipd
Ψ==> *** Exception: stack overflow

If we first promote the stage game from the list to the searchable set monad, we obtain a searchable set of plays rather than a list of them:

Ψplays :: Searchable [(Move, Move)]
Ψplays = runSelT (sequence (repeat (promote stage))) ipd

In order to obtain an element of this set, we must supply it with a predicate. If we give the constant true predicate, we will obtain an arbitrary element of the set:

Ψ> :type search plays (const True)
Ψ==> search plays (const True) :: [(Move, Move)]

Since this is an infinite list (a play of the repeated game) we request a finite prefix:

Ψ> take 6 (search plays (const True))
Ψ==> [(D,D),(D,D),(D,D),(C,C),(C,C),(C,C)]

With precision set to 4, this takes around 5 minutes to run (interpreted) on the author’s laptop.

With a little experimentation, we find that the searchable set of plays contains precisely the plays whose first three stages are (D,D), i.e. the plays that have [(D,D),(D,D),(D,D)] as a prefix. If we define a predicate that is satisfied when any of the first three elements are not (D,D), we find that the searchable set does not contain any element satisfying the predicate:

Ψ> let p xs = xs!!0 /= (D,D) || xs!!1 /= (D,D) || xs!!2 /= (D,D)
Ψ> exists plays p
Ψ==> False

However, by an appropriate choice of predicate we can force the subsequent elements to be anything:

Ψ> let p’ xs = xs!!4 == (D,D)
Ψ> take 6 (search plays p’)
Ψ==> [(D,D),(D,D),(D,D),(C,C),(D,D),(C,C)]

The stage in which behaviour changes from D to undetermined is controlled by the precision. If the precision is reduced to 3 then the searchable set plays changes to the set of streams with [(D,D),(D,D)] as a prefix. Although increasing the precision above 4 is too slow to test in practice, presumably as the precision tends to infinity the searchable set plays will converge (in some suitable sense) to a singleton set containing only [(D,D),(D,D),(D,D),…]. This method is thus unable to compute the strategic plays of the (many) other subgame perfect equilibria of IPD.

There are two phenomena that demand an explanation here. The first is the switch in behaviour, determined by the precision parameter. This is because we are maximising only up to precision : after the switch in behaviour, the difference in payoff caused by different choices is smaller than the error bound. The reason why C is chosen by default is an implementation detail: ultimately it comes from the ordering [C,D] when we defined Move as an instance of Finite. If we had instead written instance Finite Move where exhaust = [D,C], this phenomenon would vanish and we would obtain [(D,D),(D,D),…] by default, but still be able to force C after the (no longer apparent) behaviour switch.

The second phenomenon is far more subtle. Ignoring rounding errors, it appears that we have obtained a singleton set containing only [(D,D),(D,D),…]. This is the limit of the solution set of an -stage finitely iterated prisoner’s dilemma as . However, the infinitely iterated prisoner’s dilemma has a much larger solution set. (This sort of discontinuity is common in game theory.)

We leave the explanation of this as an open problem. One possible route to a solution is to be clear about the distinction between subgame perfect equilibria and equilibria that can arise by backward induction. It is common that game theory texts conflate these two things, or are imprecise about the distinction. A possible conjecture is that the set of ‘backward induction equilibria’ is continuous as the number of stages goes to infinity, whereas the set of subgame perfect equilibrium plays has a discontinuity at the limit. (The former limit is not a known concept in classical game theory, since backward induction for infinite games was only introduced in [6].)

The following points all need to be considered for a full understanding of this method:

  • The game-theoretic meaning of the monad for a finite game, where is the powerset monad

  • The game-theoretic issues of extending from finite to infinite games, such as discontinuity of the solution set

  • The topological issues (for correctness) and computability issues (for termination) of using searchable sets

  • The effect of using -, which can be seen as reducing an infinite game to a finite but unbounded one.

References

  • [1]
  • [2] Martin Escardó (1998): Effective and sequential definition by cases on the reals via infinite signed-digit numerals. In: Proceedings of the 3rd workshop on Computation and Approximation, ENTCS 13, doi:10.1016/S1571-0661(05)80214-2.
  • [3] Martin Escardó (2004): Synthetic topology of data types and classical spaces. ENTCS 87, pp. 21–156, doi:10.1016/j.entcs.2004.09.017.
  • [4] Martin Escardó (2007): Seemingly impossible functional programs. Available at http://math.andrej.com/2007/09/28/seemingly-impossible-functional-programs/.
  • [5] Martin Escardó (2008): Exhaustible sets in higher-type computation. Logical methods in computer science 4(3:3), pp. 1–37, doi:10.2168/lmcs-4(3:3)2008.
  • [6] Martin Escardó & Paulo Oliva (2010): Selection functions, bar recursion and backward induction. Mathematical structures in computer science 20(2), pp. 127–168, doi:10.1017/S0960129509990351.
  • [7] Martin Escardó & Paulo Oliva (2010): What sequential games, the Tychonoff theorem and the double-negation shift have in common. In: Proceedings of MSFP’10, doi:10.1145/1863597.1863605.
  • [8] Martin Escardó & Paulo Oliva (2011): Sequential games and optimal strategies. Proceedings of the Royal Society A 467, pp. 1519–1545, doi:10.1098/rspa.2010.0471.
  • [9] Martin Escardó & Paulo Oliva (2012): Computing Nash equilibria of unbounded games. Proceedings of the Turing centenary conference, Available at http://www.easychair.org/publications/paper/106503.
  • [10] Martin Escardó & Paulo Oliva (2017): The Herbrand functional interpretation of the double negation shift. Journal of symbolic logic 82(2), pp. 590–607, doi:10.1017/jsl.2017.8.
  • [11] Jules Hedges (2014): Monad transformers for backtracking search. In: Proceedings of MSFP’14, EPTCS, pp. 31–50, doi:10.4204/EPTCS.153.3.
  • [12] Jules Hedges (2015): The selection monad as a CPS translation. ArXiv:1503.06061.
  • [13] Jules Hedges (2016): Towards compositional game theory. Ph.D. thesis, Queen Mary University of London.
  • [14] Jules Hedges, Paulo Oliva, Evguenia Shprits, Viktor Winschel & Philipp Zahn (2017): Selection equilibria of higher-order games. In: Practical aspects of declaritive languages, Lecture Notes in Computer Science 10137, Springer, pp. 136–151, doi:10.1007/978-3-319-51676-99.
  • [15] Pierre Lescanne & Matthieu Perrinel (2012): “Backward” coinduction, Nash equilibrium and the rationality of escalation. Acta Informatica 49(3), pp. 117–137, doi:10.1007/s00236-012-0153-3.
  • [16] Kevin Leyton-Brown & Yoav Shoham (2008): Essentials of game theory: a concice, multidisciplinary introduction. Morgan and Claypool.
  • [17] George Mailath & Larry Samuelson (2006): Repeated games and reputations. Oxford University Press, doi:10.1093/acprof:oso/9780195300796.001.0001.
  • [18] Andreu Mas-Colell, Michael Whinston & Jerry Green (1995): Microeconomic theory. Oxford University Press.
  • [19] Eugenio Moggi (1991): Notions of computation and monads. Information and Computation 93, pp. 55–92, doi:10.1016/0890-5401(91)90052-4.
  • [20] John Nash (1950): Equilibrium points in -person games. Proceedings of the National Academy of Sciences 36(1), pp. 48–49, doi:10.1073/pnas.36.1.48.
  • [21] John von Neumann & Oskar Morgenstern (1944): Theory of games and economic behaviour. Princeton University Press.
  • [22] Ulrich Schwalbe & Paul Walker (2001): Zermelo and the early history of game theory. Games and economic behaviour 34, pp. 123–137, doi:10.1006/game.2000.0794.
  • [23] Klaus Weihrauch (1995): A simple introduction to computable analysis. Technical Report, Fern Universität.