This article outlines a method for automatically generating models of dynamic decision-making that both have strong predictive power and are interpretable in human terms. This is useful for designing empirically grounded agent-based simulations and for gaining direct insight into observed dynamic processes. We use an efficient model representation and a genetic algorithm-based estimation process to generate simple approximations that explain most of the structure of complex stochastic processes. This method, implemented in C++ and R, scales well to large data sets. We apply our methods to empirical data from human subjects game experiments and international relations. We also demonstrate the method’s ability to recover known data-generating processes by simulating data with agent-based models and correctly deriving the underlying decision models for multiple agent models and degrees of stochasticity.
2 Model Representation
This article describes a modeling method designed to understand data on dynamic decision-making. We have created a practical, easy-to-use software package implementing the method. Although our method is more broadly applicable, the motivation for the model representation was prediction of individual behavior in strategic interactions, i.e. games. Most behavioral game-theoretic treatments of repeated games use action-learning models that specify the way in which attractions to actions are updated by an agent as play progresses [Camerer, 2003]. Action learning models can perform poorly at predicting behavior in games where cooperation (e.g. Prisoner’s Dilemma) or coordination (e.g. Bach or Stravinsky) are key [Hanaki, 2004]. Also, they often fail to account for the effects of changes in information and player matching conditions [McKelvey and Palfrey, 2001]. In this paper, we model repeated game strategies as decision-making procedures that can explicitly consider the dynamic nature of the environment, e.g. if my opponent cooperated last period then I will cooperate this period. We represent decision-making with finite-state machines and use a genetic algorithm to estimate the values of the state transition tables. This combination of representation and optimization allows us to efficiently and effectively model dynamic decision-making.
Traditional game theories define strategies as complete contingent plans that specify how a player will act in every possible state; however, when the environment becomes even moderately complex the number of possible states of the world can grow beyond the limits of human cognition[Miller, 1996, Fudenberg et al., 2012]. One modeling response to cognitive limitations has been to exogenously restrict the complexity of repeated game strategies by representing them as Moore machines – finite state machines whose outputs depend only on their current state [Moore, 1956] – with a small number of states [Rubinstein, 1986, Miller, 1996, Hanaki et al., 2005]. Moore machines can model bounded rationality, explicitly treating procedures of decision-making [Osborne and Rubinstein, 1994]. A machine modeling agent responding to the actions of agent is a four-tuple , where is the set of states, is the initial state, is the output function mapping a state to an action, and (where ) is the transition function mapping a state and another agent’s action to a state [Osborne and Rubinstein, 1994]. We generalize this model beyond games by allowing for more inputs in than , and by providing empirical rankings of these inputs that can be used to induce sparsity in more context-rich environments. The Moore machine can have many underlying states for a single observable action, allowing it to represent arbitrarily complex decision processes. The complexity is directly controlled by the number of states, which is a tuning parameter of our method that can be optimized by Algorithm 2 for predictive performance.
Fig. 1 shows examples of finite state machines (FSMs) representing strategies for the Iterated Prisoner’s Dilemma game (see Section 4 for game details): The possible states are cooperate () and defect (), and after initialization the current state is determined by the history of the player and her opponent cooperating or defecting (cc, cd, dc, dd) in the previous period.
Genetic algorithms (GAs) have been used to model agents updating beliefs based on endogenously determined variables in a general equilibrium environment [Bullard and Duffy, 1999], and agents learning to make economic decisions [Arifovic, 1994, Arifovic and Eaton, 1995, Marks et al., 1995, Midgley et al., 1997]. In contrast to investigations of GAs as models of agent learning and behavior, we use GAs to automatically generate interpretable agent decision models from empirical data. This is similar to work by Fogel , Miller , and Miller and Page , in which GAs evolved FSMs based on their interactions with one another in simulated games, but whereas these were theoretical exercises, we are estimating models to explain and predict observed interactions among real agents. We use GAs as optimization routines for estimation because they perform well in rugged search spaces to quickly solve discrete optimization problems, are a natural complement to our binary string representation of FSMs [Goldberg and Holland, 1988], and are easily parallelized.
Duffy and Engle-Warnick 
combined empirical experimental data with genetic programming (GP) to model behavior. GP, with the same genetic operations as most GAs[Koza, 1992], is a process that can evolve arbitrary computer programs [Duffy, 2006]
. We apply genetic operations to FSM representations rather than to all predictor variables and functional primitives because we are interested in deriving decision models with a particular structure: FSMs with latent states, rather than models conditioning on observable variables with any arbitrary functional form. With data-driven modeling, it is desirable to impose as many constraints as can be theoretically justified on the functional form of the model (seeMiller and Page  for interesting theoretical results related to FSM agents interacting in games). This avoids overfitting by constraining the model to a functional form that is likely generalizable across contexts, allows genetic selection to converge better, and reduces the computational effort required to explore parameter space. An additional challenge in implementing GP is specifying the genetic operations on the function primitives while ensuring that they will always produce syntactically valid programs that represent meaningful decision models. This requires fine-tuning to specific problems, which we avoid because we are designing a general method applicable across domains.
Our choice to use Moore machines as the building blocks of our decision modeling method ensures that estimation will produce easily interpretable models with latent states that can be represented graphically (see Fig. 1
for examples). Our process represents Moore machines as Gray-encoded binary strings consisting of an action vector followed by elements that form the state matrixSavage . For details, see Fig. 2(a) and our
decode_stat_matfunctions. This way, genetic operators can have free reign to search the global parameter space guided by the ability to predict provided data with the decoded binary strings.
The vast majority of computation time for Algorithm 1 is the evaluation of the predictive accuracy of the FSMs (not the stochastic generation of candidate FSMs). To improve performance we implement this evaluation in C++ using the Rcpp package [Eddelbuettel, 2013], and, because it is embarrassingly parallel, distribute it across processor cores. We have incorporated our code into an R package with an API of documented function calls and using the GA package [Scrucca, 2013] to perform the GA evolution. A user can generate an optimized FSM by calling
data is an R
data.frame object with columns representing the time period of the decision, the decision taken at that period, and any predictor variables. There are many additional optional arguments to this
evolve_model function, but they have sensible default values. Our package then generates C++ code for a fitness function and uses it to evaluate automatically generated candidate models. Once the convergence criteria of this iterative search process is satisfied, the best FSM is identified, and each predictor variable is assessed by checking its identifiability and computing its importance in that decision model. The return value contains a descriptive summary of all results, including those shown in Fig. 3.
The number of states in the FSM and the number of predictor variables to include are hyper-parameters that control the complexity of the model. Beginning with the simplest possible model and increasing complexity by adding states and variables, we often observe that at first, out-of-sample predictive accuracy grows because bias falls more quickly than variance rises; but eventually, adding further complexity reduces bias less than it increases variance so accuracy decreases[Hastie et al., 2009]. We can use cross-validation on the training data to find values for the hyper-parameters that maximize predictive accuracy (Algorithm 2). We assess the out-of-sample predictive accuracy of the final model with a hold-out test set of data, distinct from the cross-validation test-sets in Algorithm 2. Increasing complexity to optimize predictive accuracy introduces a new trade-off because more complex decision models are harder to interpret in human terms, so the “best” solution will depend on the goals of the analysis.
4 Experimental Game Data
The Iterated Prisoner’s Dilemma (IPD) is often used as a model of cooperation [Axelrod, 1984]. A one-shot PD game has a unique equilibrium in which each player chooses to defect even though both players would be better off if they cooperated. Suppose two players play the simultaneous-move PD game in Fig. 2, observe the choice of the other person, and then play the same simultaneous-move game again. Even in the (finitely) repeated version, no cooperation can be achieved by rational income maximizers. This tension between maximizing collective and individual gain is representative of a broad class of social situations (e.g. the “tragedy of the commons” [Hardin, 1968]).
We applied our procedure to data from laboratory experiments on human subjects playing IPD games for real financial incentives. Nay  gathered and integrated data from many experiments, conducted and analyzed by Bereby-Meyer and Roth , Duffy and Ochs , Kunreuther et al. , Dal Bo and Frechette  and Fudenberg et al. . All of the experiments share the same underlying repeated Prisoner’s Dilemma structure, although the details of the games differed. Nay’s data set comprises 135,388 cooperation decisions, which is much larger than previous studies of repeated game strategies.
Fudenberg et al.  and Dal Bo and Frechette  modeled their IPD experimental data with repeated game strategies; however, they applied a maximum likelihood estimation process to estimate the prevalence of a relatively small predefined set of strategies. In contrast, our estimation process automatically searches through a very large parameter space that includes all possible strategies up to a given number of states and does not require the analyst to predefine any strategies, or even understand the game.
We used 80% of our data for training and reserved the other 20% as a hold-out test set. Fig. 3 shows different representations of the the fittest two-state machine of a GA population evolved on the training data: The raw Gray-encoded and binary string (Fig. 2(a)), the bitstring decoded into state matrix and action vector form (Fig. 2(b)), and the corresponding graph representation (Fig. 2(c)). We measure variable importance (Fig. 2(d)) by switching each value of an estimated model’s state matrix to another value in its feasible range, measuring the decrease in goodness of fit to the training data, normalizing the values, then summing across each column to estimate the relative importance of each predictor variable (in this case, the moves each player made in the previous turn).
Fig. 4 illustrates the GA run that evolved the FSM of Fig. 3 by predicting cooperation decisions in IPD training data games. This GA run, which only took a few seconds on a modest laptop, used common algorithm settings: a population of 175 FSMs initialized with random bitstrings. If the analyst has an informed prior belief about the subjects’ decision models, she can initialize the population with samples drawn from that prior distribution, but this paper focuses on deriving useful results from random initializations, corresponding to uniform priors, where the analyst only provides data. A linear-rank selection process used the predictive ability of individuals to select a subset of the population from which to create the next generation. A single-point crossover process was applied to the binary values of selected individuals with 0.8 probability, uniform random mutation was conducted with probability 0.1, and the top 5% fittest individuals survived each generation without crossover or mutation, ensuring that potentially very good solutions would not be lost [Scrucca, 2013]. These are standard GA parameter settings and can be adjusted if convergence is taking particularly long for a given dataset.
Using theoretical agent-based simulations and a fitness measure that is a function of simulated payoffs, Axelrod  demonstrated the fitness of the tit-for-tat (TFT) strategy. Using a fitness measure that is a function of the ability to explain human behavior, we discovered a hybrid of TFT and grim trigger (GT), which we call noisy grim (NG). TFT’s state is determined solely by the opponent’s last play. GT will never exit a defecting state, no matter what the opponent does.
With traditional repeated game strategies such as TFT and GT, the player always takes the action corresponding to her current state (boldface transitions in Fig. 2(c)), but if we add noise to decisions so the player will sometimes choose the opposite action from her current state (italic transitions in Fig. 2(c)), then the possibility arises for both the player and opponent to cooperate when the player is in the defecting state (i.e. to reach the second row first column position of the state matrix in Fig. 2(b)). This would return the player to the cooperating state (see, e.g., Chong and Yao ). Noisy grim’s predictions on the hold-out test data are 82% accurate, GT’s are 72% accurate, and TFT’s are 77% accurate. We also tested 16 other repeated game strategies for the IPD from [Fudenberg et al., 2012]. Their accuracy on the test set ranged from 46% to 77%. Our method uncovered a deterministic dynamic decision model that predicts IPD play better than all of the existing theoretical automata models of IPD play that we are aware of and has interesting relationships to the two most well-known models: TFT and GT.
This process has allowed us to estimate a highly interpretable decision model (fully represented by the small image of Fig. 2(c)) that predicts most of the behavior of hundreds of human participants, merely by plugging in the dataset as input. We address the potential concern that the process is too tuned to this specific case study by inputting a very different dataset from the field of international relations and obtaining useful results. However, before moving to more empirical data—where the data-generating process can never be fully known—to test how robustly we can estimate a known model, we repeatedly simulate a variety of known data-generating mechanisms and then apply the method to the resulting choice data.
5 Simulated Data
In the real world, people rarely strategically interact by strictly following a deterministic strategy [Chong and Yao, 2005]. Whimsy, strategic randomization, or error may induce a player to choose a different move from the one dictated by her strategy. To study whether our method could determine an underlying strategy that an agent would override from time to time, we followed the approach of Fudenberg et al.  and created an agent-based model of the IPD in which agents followed deterministic strategies, such as TFT and GT, but made noisy decisions: At each time period, the deterministic strategy dictates each agent’s preferred action, but the agent will choose the opposite action with probability , where ranges from 0 (perfectly deterministic play) to 0.5 (completely random play). The noise parameter, , is constant across all states of a strategy of a particular agent for any given simulation experiment we conducted.
When a player follows an unknown strategy, characterized by latent states, discovering the strategy (the actions corresponding to each state and transitions between the states) requires observed data that explores as much as possible of the state transition matrix defined by all possible combinations of state and predictor values (for these strategies the predictors are the history of play). Many deterministic strategy pairings can quickly reach equilibria in which players repeat the same moves for the rest of the interaction. If the player and opponent both use TFT and both make the same first move, every subsequent move will repeat the first move. If the opponent plays GT, then after the first time the player defects the opponent will defect for the rest of the session and the data will provide little information on the player’s response to cooperation by the opponent. However, if the opponent plays with noise, the play will include many instances of cooperation and defection by the opponent, and will thus sample the accessible state space for the player’s strategy more thoroughly than if the opponent plays deterministically. Indeed, this is why Fudenberg et al.  added noise to action choices in their human subjects experimental games.
We simulated approximately 17 million interactions, varying paired decision models of each agent [(TFT, TFT), (TFT, GT), (GT, TFT), (GT, GT)] and also varying the noise parameter (0, 0.025, … , 0.5) for each of two noise conditions: where both players made equally noisy decisions, and where only the opponent made noisy decisions while the player under study strictly followed a deterministic strategy. We ran 25 replicates of each of the 168 experimental conditions, with 4,000 iterations of game play for each replicate, and then applied the FSM estimation method to each replicate of the simulated choice data to estimate the strategy that the agent player under study was using.
Being in state/row (e.g. 2) corresponds to the player taking action (e.g. D) in the current turn. All entries in row corresponding to the player taking action in the current period (e.g. columns 2 and 4 for D) are identifiable. Entries in row that correspond to not taking action in the current period (e.g. columns 1 and 3 for row 2) represent transitions that cannot occur in strictly deterministic play, so their values cannot affect play and thus cannot be determined empirically. We take this into account when testing the method’s ability to estimate underlying deterministic models: this is why only 6 elements of a 10-element TFT or GT matrix can be identified (Fig. 5). We also take this into account when estimating models from empirical data, where the data-generating process is assumed to be stochastic: each element of the matrix that would be inaccessible under deterministic play is identified, and the fitness is calculated with a strategy matrix in which that element is changed to its complement (“flipped”). If flipping the element does not change the fitness, then the two complementary strategies are indistinguishable and the element in question cannot be determined empirically. If each element decreases the fitness when it is flipped, then the strategy corresponds to a deterministic approximation of a stochastic process and all of the elements of the state matrix can be identified.
When the noise parameter was zero, most of the models estimated by the GA had at least two incorrect elements. However, for moderate amounts of noise (–), all of the models estimated by the GA were correct (see Fig. 4(a)). For noise levels above in the player, the amount of error rose rapidly with , as expected because at the action the player chooses moves completely at random so there is no strategy to discover. When a strictly deterministic player faced a noisy opponent, the GA correctly identified the player’s strategy for all noise levels above (see Fig. 4(b)).
6 Observational Data
In order to extend this method to more complex situations the predictor variables (columns of the state matrices) can include any time-varying variable relevant to an agent’s decision. In context-free games such as the IPD, the only predictor variables are the moves the players made in the previous turn, but models of strategic interactions in context-rich environments may include other relevant variables.
We find it difficult to interpret graphical models with more than four predictors, but an analyst who had many potentially relevant predictor variables and was unable to use theory alone to reduce the number of predictors sufficiently to generate easily interpretable models with our method could take four courses of action (listed in order of increasing reliability and computation time):
Before FSM estimation, apply a (multivariate or univariate) predictor variable selection method.
Before FSM estimation, estimate an arbitrary predictive model that can produce variable importance rankings and then use the top predictors for FSM estimation.
After FSM estimation with predictors, inspect the returned predictor variable importance ranking, and remove all but the top from her dataset and re-run estimation.
Conduct FSM estimation with all combinations of predictors out of all relevant predictors and choose the estimated model with the best performance (usually highest out-of-sample accuracy).
We illustrate the use of extra predictor variables by applying our method to an example from international relations involving repeated water management-related interactions between countries that share rivers. We use data compiled by Brochmann  on treaty signing and cooperation over water quality, water quantity, and flood control from 1948–1999 to generate a model for predicting whether two countries will cooperate. We used three lagged variables: whether there was water-related conflict between them in the previous year, whether they cooperated around water in the previous year, and whether they had signed a water-related treaty during any previous year. This data set was too small to divide into training and hold-out subsets for assessing predictive accuracy, so we report models’ accuracy in reproducing the training data (a random choice model is 50% accurate). A two-state decision model (Fig. 5(a)) is 73% accurate, a three-state model (Fig. 5(c)) is 78% accurate, and a four-state model is 82% accurate, but its complexity makes it difficult to interpret visually so it is not shown.
Accuracy can be a problematic measure when the classes are imbalanced, i.e. if a class the model is trying to predict is rare. Many alternatives to accuracy are available that illuminate different aspects of predictive power. For instance, precision is the proportion of (cooperation) event signals predicted by our models that are correct and recall is the proportion of events that are predicted by our models. For this subset of the dataset, cooperate and not cooperate were almost evenly distributed and to maintain a comparison to the experimental and simulated data we used accuracy as the fitness measure.
In the two-state model, whether or not the countries cooperated in the previous year, the combination of conflict and treaty-signing in the previous year always produces cooperation, whereas conflict without treaty-signing in the previous year always produces non-cooperation. In the three-state model, three of the four outcomes that include conflict lead to a transition from non-cooperation to cooperation, and four of the six outcomes that cause transitions from cooperation (states and ) to non-cooperation are non-conflict outcomes. While this does not tell us something decisive about the role of conflict, it suggests that there may be a counter-intuitive role of conflict in promoting cooperation. Brochmann , using a bivariate probit simultaneous equation model, has a similar finding: “In the aftermath of conflict, states may be particularly eager to solve important issues that could cause future problems” (p. 159).
This paper outlined a method for estimating interpretable models of dynamic decision-making. By estimating a global, deterministic, simple function for a given dataset, imposing constraints on the number of predictor variables, and providing options for reducing the number of predictor variables, our process facilitates capturing a significant amount of information in a compact and useful form. The method can be used for designing empirically grounded agent models in agent-based simulations and for gaining direct insight into observed behaviors of real agents in social and physical systems. Combining state matrices and a genetic algorithm has proven effective for simulated data, experimental game data, and observational international relations data. With the simulated data, we successfully recovered the exact underlying models that generated the data. With the real data, we estimated simple deterministic approximations that explain most of the structure of the unknown underlying process. We discovered a theoretically interesting dynamic decision model that predicted IPD play better than all of the existing theoretical models of IPD play that we were aware of.
We have released an open-source R package that implements the methods described here to estimate any time series classification model that uses a small number of binary predictor variables and moves back and forth between the values of the outcome variable over time. Larger sets of predictor variables can be reduced to smaller sets by applying one of the four methods outlined in Section 6. Although the predictor variables must be binary, a quantitative variable can be converted into binary by division of the observed values into high/low classes. Future releases of the package may include additional estimation methods to complement GA optimization.
This work was supported by U.S. National Science Foundation grants EAR-1416964 and EAR-1204685.
- Arifovic  Jasmina Arifovic. Genetic algorithm learning and the cobweb model. Journal of Economic Dynamics and Control, 18:3–28, 1994.
- Arifovic and Eaton  Jasmina Arifovic and Curtis Eaton. Coordination via genetic learning. Computational Economics, 8:181–203, 1995. doi: 10.1007/BF01298459.
- Axelrod  Robert M. Axelrod. The Evolution of Cooperation. Basic Books, New York, 1984.
- Axelrod  Robert M. Axelrod. The Complexity of Cooperation: Agent-based Models of Competition and Collaboration. Princeton University Press, Princeton, 1997.
- Bereby-Meyer and Roth  Yoella Bereby-Meyer and Alvin E. Roth. The speed of learning in noisy games: Partial reinforcement and the sustainability of cooperation. The American Economic Review, 96:1029–1042, 2006.
- Brochmann  Marit Brochmann. Signing river treaties: Does it improve river cooperation? International Interactions, 38:141–163, 2012. doi: 10.1080/03050629.2012.657575.
- Bullard and Duffy  James Bullard and John Duffy. Using genetic algorithms to model the evolution of heterogeneous beliefs. Computational Economics, 13:41–60, 1999. doi: 10.1023/A:1008610307810.
- Camerer  Colin F. Camerer. Behavioral Game Theory: Experiments in Strategic Interaction. Princeton University Press, Princeton, 2003.
Chong and Yao 
S.Y. Chong and Xin Yao.
Behavioral diversity, choices and noise in the iterated prisoner’s
IEEE Transactions on Evolutionary Computation, 9:540–551, 2005.
- Dal Bo and Frechette  Pedro Dal Bo and Guillaume R Frechette. The evolution of cooperation in infinitely repeated games: Experimental evidence. American Economic Review, 101:411–429, 2011. doi: 10.1257/aer.101.1.411.
- Duffy  John Duffy. Agent-based models and human subject experiments. In Handbook of Computational Economics, volume 2, pages 949–1011. Elsevier, Amsterdam, 2006.
- Duffy and Engle-Warnick  John Duffy and Jim Engle-Warnick. Using symbolic regression to infer strategies from experimental data. In Evolutionary Computation in Economics and Finance, pages 61–82. Springer, New York, 2002.
- Duffy and Ochs  John Duffy and Jack Ochs. Cooperative behavior and the frequency of social interaction. Games and Economic Behavior, 66:785–812, 2009. doi: 10.1016/j.geb.2008.07.003.
- Eddelbuettel  Dirk Eddelbuettel. Seamless R and C++ Integration with Rcpp. Springer, New York, 2013.
- Fogel  David B. Fogel. Evolving behaviors in the iterated prisoner’s dilemma. Evolutionary Computation, 1:77–97, 1993.
- Fudenberg et al.  Drew Fudenberg, David G Rand, and Anna Dreber. Slow to anger and fast to forgive: Cooperation in an uncertain world. American Economic Review, 102:720–749, 2012. doi: 10.1257/aer.102.2.720.
Goldberg and Holland 
David E. Goldberg and John H. Holland.
Genetic algorithms and machine learning.Machine Learning, 3:95–99, 1988. doi: 10.1023/A:1022602019183.
- Hanaki  Nobuyuki Hanaki. Action learning versus strategy learning. Complexity, 9:41–50, 2004.
- Hanaki et al.  Nobuyuki Hanaki, Rajiv Sethi, Ido Erev, and Alexander Peterhansl. Learning strategies. Journal of Economic Behavior & Organization, 56:523–542, 2005. doi: 10.1016/j.jebo.2003.12.004.
- Hardin  Garrett Hardin. The tragedy of the commons. Science, 162:1243–1248, 1968. doi: DOI:10.1126/science.162.3859.1243.
- Hastie et al.  Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. Springer, New York, NY, 2nd edition, 2009. ISBN 9780387848570.
- Koza  John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Bradford, Cambridge, MA, 1992.
- Kunreuther et al.  Howard Kunreuther, Gabriel Silvasi, Eric T. Bradlow, and Dylan Small. Bayesian analysis of deterministic and stochastic prisoner’s dilemma games. Judgment and Decision Making, 4:363–384, 2009.
- Marks et al.  Robert E. Marks, David F. Midgley, and Lee G. Cooper. Adaptive behaviour in an oligopoly. In Jörg Biethahn and Volker Nissen, editors, Evolutionary Algorithms in Management Applications, pages 225–239. Springer, New York, 1995.
- McKelvey and Palfrey  Richard D. McKelvey and Thomas R. Palfrey. Playing in the dark: Information, learning, and coordination in repeated games. Technical report, California Institute of Technology, Pasadena, 2001.
- Midgley et al.  David F. Midgley, Robert E. Marks, and Lee C. Cooper. Breeding competitive strategies. Management Science, 43:257–275, 1997. doi: 10.1287/mnsc.43.3.257.
- Miller  John H. Miller. The coevolution of automata in the repeated prisoner’s dilemma. Journal of Economic Behavior & Organization, 29:87–112, 1996.
- Miller and Page  John H. Miller and Scott E. Page. Complex Adaptive Systems: An Introduction to Computational Models of Social Life. Princeton University Press, Princeton, 2007.
- Moore  Edward Moore. Gedanken-experiments on sequential machines. Automata Studies, 34:129–153, 1956.
- Nay  John Jacob Nay. Predicting cooperation and designing institutions: An integration of behavioral data, machine learning, and simulation. In Winter Simulation Conference Proceedings, Savannah, GA, December 2014.
- Osborne and Rubinstein  Martin J. Osborne and Ariel Rubinstein. A Course in Game Theory. MIT Press, Cambridge, MA, 1994.
- R Core Team  R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2014. URL http://www.R-project.org/.
- Rubinstein  Ariel Rubinstein. Finite automata play the repeated prisoner’s dilemma. Journal of Economic Theory, 39(1):83–96, June 1986. ISSN 0022-0531. doi: 10.1016/0022-0531(86)90021-9. URL http://www.sciencedirect.com/science/article/pii/0022053186900219.
- Savage  C. Savage. A Survey of Combinatorial Gray Codes. SIAM Review, 39(4):605–629, January 1997. ISSN 0036-1445. doi: 10.1137/S0036144595295272. URL http://epubs.siam.org/doi/abs/10.1137/S0036144595295272.
- Scrucca  Luca Scrucca. GA: A package for genetic algorithms in R. Journal of Statistical Software, 53:1–37, 2013. URL http://www.jstatsoft.org/v53/i04/.
- Xie  Yihui Xie. Dynamic Documents with R and knitr. Chapman & Hall/CRC, Boca Raton, 2014.