1 Abstract
This article outlines a method for automatically generating models of dynamic decisionmaking that both have strong predictive power and are interpretable in human terms. This is useful for designing empirically grounded agentbased simulations and for gaining direct insight into observed dynamic processes. We use an efficient model representation and a genetic algorithmbased estimation process to generate simple approximations that explain most of the structure of complex stochastic processes. This method, implemented in C++ and R, scales well to large data sets. We apply our methods to empirical data from human subjects game experiments and international relations. We also demonstrate the method’s ability to recover known datagenerating processes by simulating data with agentbased models and correctly deriving the underlying decision models for multiple agent models and degrees of stochasticity.
2 Model Representation
This article describes a modeling method designed to understand data on dynamic decisionmaking. We have created a practical, easytouse software package implementing the method. Although our method is more broadly applicable, the motivation for the model representation was prediction of individual behavior in strategic interactions, i.e. games. Most behavioral gametheoretic treatments of repeated games use actionlearning models that specify the way in which attractions to actions are updated by an agent as play progresses [Camerer, 2003]. Action learning models can perform poorly at predicting behavior in games where cooperation (e.g. Prisoner’s Dilemma) or coordination (e.g. Bach or Stravinsky) are key [Hanaki, 2004]. Also, they often fail to account for the effects of changes in information and player matching conditions [McKelvey and Palfrey, 2001]. In this paper, we model repeated game strategies as decisionmaking procedures that can explicitly consider the dynamic nature of the environment, e.g. if my opponent cooperated last period then I will cooperate this period. We represent decisionmaking with finitestate machines and use a genetic algorithm to estimate the values of the state transition tables. This combination of representation and optimization allows us to efficiently and effectively model dynamic decisionmaking.
Traditional game theories define strategies as complete contingent plans that specify how a player will act in every possible state; however, when the environment becomes even moderately complex the number of possible states of the world can grow beyond the limits of human cognition
[Miller, 1996, Fudenberg et al., 2012]. One modeling response to cognitive limitations has been to exogenously restrict the complexity of repeated game strategies by representing them as Moore machines – finite state machines whose outputs depend only on their current state [Moore, 1956] – with a small number of states [Rubinstein, 1986, Miller, 1996, Hanaki et al., 2005]. Moore machines can model bounded rationality, explicitly treating procedures of decisionmaking [Osborne and Rubinstein, 1994]. A machine modeling agent responding to the actions of agent is a fourtuple , where is the set of states, is the initial state, is the output function mapping a state to an action, and (where ) is the transition function mapping a state and another agent’s action to a state [Osborne and Rubinstein, 1994]. We generalize this model beyond games by allowing for more inputs in than , and by providing empirical rankings of these inputs that can be used to induce sparsity in more contextrich environments. The Moore machine can have many underlying states for a single observable action, allowing it to represent arbitrarily complex decision processes. The complexity is directly controlled by the number of states, which is a tuning parameter of our method that can be optimized by Algorithm 2 for predictive performance.Fig. 1 shows examples of finite state machines (FSMs) representing strategies for the Iterated Prisoner’s Dilemma game (see Section 4 for game details): The possible states are cooperate () and defect (), and after initialization the current state is determined by the history of the player and her opponent cooperating or defecting (cc, cd, dc, dd) in the previous period.
3 Estimation
Genetic algorithms (GAs) have been used to model agents updating beliefs based on endogenously determined variables in a general equilibrium environment [Bullard and Duffy, 1999], and agents learning to make economic decisions [Arifovic, 1994, Arifovic and Eaton, 1995, Marks et al., 1995, Midgley et al., 1997]. In contrast to investigations of GAs as models of agent learning and behavior, we use GAs to automatically generate interpretable agent decision models from empirical data. This is similar to work by Fogel [1993], Miller [1996], and Miller and Page [2007], in which GAs evolved FSMs based on their interactions with one another in simulated games, but whereas these were theoretical exercises, we are estimating models to explain and predict observed interactions among real agents. We use GAs as optimization routines for estimation because they perform well in rugged search spaces to quickly solve discrete optimization problems, are a natural complement to our binary string representation of FSMs [Goldberg and Holland, 1988], and are easily parallelized.
Duffy and EngleWarnick [2002]
combined empirical experimental data with genetic programming (GP) to model behavior. GP, with the same genetic operations as most GAs
[Koza, 1992], is a process that can evolve arbitrary computer programs [Duffy, 2006]. We apply genetic operations to FSM representations rather than to all predictor variables and functional primitives because we are interested in deriving decision models with a particular structure: FSMs with latent states, rather than models conditioning on observable variables with any arbitrary functional form. With datadriven modeling, it is desirable to impose as many constraints as can be theoretically justified on the functional form of the model (see
Miller and Page [2007] for interesting theoretical results related to FSM agents interacting in games). This avoids overfitting by constraining the model to a functional form that is likely generalizable across contexts, allows genetic selection to converge better, and reduces the computational effort required to explore parameter space. An additional challenge in implementing GP is specifying the genetic operations on the function primitives while ensuring that they will always produce syntactically valid programs that represent meaningful decision models. This requires finetuning to specific problems, which we avoid because we are designing a general method applicable across domains.Our choice to use Moore machines as the building blocks of our decision modeling method ensures that estimation will produce easily interpretable models with latent states that can be represented graphically (see Fig. 1
for examples). Our process represents Moore machines as Grayencoded binary strings consisting of an action vector followed by elements that form the state matrix
Savage [1997]. For details, see Fig. 2(a) and ourbuild_bitstring
, decode_action_vec
, and decode_stat_mat
functions. This way, genetic operators can have free reign to search the global parameter space guided by the ability to predict provided data with the decoded binary strings.
The vast majority of computation time for Algorithm 1 is the evaluation of the predictive accuracy of the FSMs (not the stochastic generation of candidate FSMs). To improve performance we implement this evaluation in C++ using the Rcpp package [Eddelbuettel, 2013], and, because it is embarrassingly parallel, distribute it across processor cores. We have incorporated our code into an R package with an API of documented function calls and using the GA package [Scrucca, 2013] to perform the GA evolution. A user can generate an optimized FSM by calling evolve_model(data)
, where data
is an R data.frame
object with columns representing the time period of the decision, the decision taken at that period, and any predictor variables. There are many additional optional arguments to this evolve_model
function, but they have sensible default values. Our package then generates C++ code for a fitness function and uses it to evaluate automatically generated candidate models. Once the convergence criteria of this iterative search process is satisfied, the best FSM is identified, and each predictor variable is assessed by checking its identifiability and computing its importance in that decision model. The return value contains a descriptive summary of all results, including those shown in Fig. 3.
The number of states in the FSM and the number of predictor variables to include are hyperparameters that control the complexity of the model. Beginning with the simplest possible model and increasing complexity by adding states and variables, we often observe that at first, outofsample predictive accuracy grows because bias falls more quickly than variance rises; but eventually, adding further complexity reduces bias less than it increases variance so accuracy decreases
[Hastie et al., 2009]. We can use crossvalidation on the training data to find values for the hyperparameters that maximize predictive accuracy (Algorithm 2). We assess the outofsample predictive accuracy of the final model with a holdout test set of data, distinct from the crossvalidation testsets in Algorithm 2. Increasing complexity to optimize predictive accuracy introduces a new tradeoff because more complex decision models are harder to interpret in human terms, so the “best” solution will depend on the goals of the analysis.4 Experimental Game Data
The Iterated Prisoner’s Dilemma (IPD) is often used as a model of cooperation [Axelrod, 1984]. A oneshot PD game has a unique equilibrium in which each player chooses to defect even though both players would be better off if they cooperated. Suppose two players play the simultaneousmove PD game in Fig. 2, observe the choice of the other person, and then play the same simultaneousmove game again. Even in the (finitely) repeated version, no cooperation can be achieved by rational income maximizers. This tension between maximizing collective and individual gain is representative of a broad class of social situations (e.g. the “tragedy of the commons” [Hardin, 1968]).
We applied our procedure to data from laboratory experiments on human subjects playing IPD games for real financial incentives. Nay [2014] gathered and integrated data from many experiments, conducted and analyzed by BerebyMeyer and Roth [2006], Duffy and Ochs [2009], Kunreuther et al. [2009], Dal Bo and Frechette [2011] and Fudenberg et al. [2012]. All of the experiments share the same underlying repeated Prisoner’s Dilemma structure, although the details of the games differed. Nay’s data set comprises 135,388 cooperation decisions, which is much larger than previous studies of repeated game strategies.
Fudenberg et al. [2012] and Dal Bo and Frechette [2011] modeled their IPD experimental data with repeated game strategies; however, they applied a maximum likelihood estimation process to estimate the prevalence of a relatively small predefined set of strategies. In contrast, our estimation process automatically searches through a very large parameter space that includes all possible strategies up to a given number of states and does not require the analyst to predefine any strategies, or even understand the game.
We used 80% of our data for training and reserved the other 20% as a holdout test set. Fig. 3 shows different representations of the the fittest twostate machine of a GA population evolved on the training data: The raw Grayencoded and binary string (Fig. 2(a)), the bitstring decoded into state matrix and action vector form (Fig. 2(b)), and the corresponding graph representation (Fig. 2(c)). We measure variable importance (Fig. 2(d)) by switching each value of an estimated model’s state matrix to another value in its feasible range, measuring the decrease in goodness of fit to the training data, normalizing the values, then summing across each column to estimate the relative importance of each predictor variable (in this case, the moves each player made in the previous turn).


Fig. 4 illustrates the GA run that evolved the FSM of Fig. 3 by predicting cooperation decisions in IPD training data games. This GA run, which only took a few seconds on a modest laptop, used common algorithm settings: a population of 175 FSMs initialized with random bitstrings. If the analyst has an informed prior belief about the subjects’ decision models, she can initialize the population with samples drawn from that prior distribution, but this paper focuses on deriving useful results from random initializations, corresponding to uniform priors, where the analyst only provides data. A linearrank selection process used the predictive ability of individuals to select a subset of the population from which to create the next generation. A singlepoint crossover process was applied to the binary values of selected individuals with 0.8 probability, uniform random mutation was conducted with probability 0.1, and the top 5% fittest individuals survived each generation without crossover or mutation, ensuring that potentially very good solutions would not be lost [Scrucca, 2013]. These are standard GA parameter settings and can be adjusted if convergence is taking particularly long for a given dataset.
Using theoretical agentbased simulations and a fitness measure that is a function of simulated payoffs, Axelrod [1997] demonstrated the fitness of the titfortat (TFT) strategy. Using a fitness measure that is a function of the ability to explain human behavior, we discovered a hybrid of TFT and grim trigger (GT), which we call noisy grim (NG). TFT’s state is determined solely by the opponent’s last play. GT will never exit a defecting state, no matter what the opponent does.
With traditional repeated game strategies such as TFT and GT, the player always takes the action corresponding to her current state (boldface transitions in Fig. 2(c)), but if we add noise to decisions so the player will sometimes choose the opposite action from her current state (italic transitions in Fig. 2(c)), then the possibility arises for both the player and opponent to cooperate when the player is in the defecting state (i.e. to reach the second row first column position of the state matrix in Fig. 2(b)). This would return the player to the cooperating state (see, e.g., Chong and Yao [2005]). Noisy grim’s predictions on the holdout test data are 82% accurate, GT’s are 72% accurate, and TFT’s are 77% accurate. We also tested 16 other repeated game strategies for the IPD from [Fudenberg et al., 2012]. Their accuracy on the test set ranged from 46% to 77%. Our method uncovered a deterministic dynamic decision model that predicts IPD play better than all of the existing theoretical automata models of IPD play that we are aware of and has interesting relationships to the two most wellknown models: TFT and GT.
This process has allowed us to estimate a highly interpretable decision model (fully represented by the small image of Fig. 2(c)) that predicts most of the behavior of hundreds of human participants, merely by plugging in the dataset as input. We address the potential concern that the process is too tuned to this specific case study by inputting a very different dataset from the field of international relations and obtaining useful results. However, before moving to more empirical data—where the datagenerating process can never be fully known—to test how robustly we can estimate a known model, we repeatedly simulate a variety of known datagenerating mechanisms and then apply the method to the resulting choice data.
5 Simulated Data
In the real world, people rarely strategically interact by strictly following a deterministic strategy [Chong and Yao, 2005]. Whimsy, strategic randomization, or error may induce a player to choose a different move from the one dictated by her strategy. To study whether our method could determine an underlying strategy that an agent would override from time to time, we followed the approach of Fudenberg et al. [2012] and created an agentbased model of the IPD in which agents followed deterministic strategies, such as TFT and GT, but made noisy decisions: At each time period, the deterministic strategy dictates each agent’s preferred action, but the agent will choose the opposite action with probability , where ranges from 0 (perfectly deterministic play) to 0.5 (completely random play). The noise parameter, , is constant across all states of a strategy of a particular agent for any given simulation experiment we conducted.
When a player follows an unknown strategy, characterized by latent states, discovering the strategy (the actions corresponding to each state and transitions between the states) requires observed data that explores as much as possible of the state transition matrix defined by all possible combinations of state and predictor values (for these strategies the predictors are the history of play). Many deterministic strategy pairings can quickly reach equilibria in which players repeat the same moves for the rest of the interaction. If the player and opponent both use TFT and both make the same first move, every subsequent move will repeat the first move. If the opponent plays GT, then after the first time the player defects the opponent will defect for the rest of the session and the data will provide little information on the player’s response to cooperation by the opponent. However, if the opponent plays with noise, the play will include many instances of cooperation and defection by the opponent, and will thus sample the accessible state space for the player’s strategy more thoroughly than if the opponent plays deterministically. Indeed, this is why Fudenberg et al. [2012] added noise to action choices in their human subjects experimental games.
We simulated approximately 17 million interactions, varying paired decision models of each agent [(TFT, TFT), (TFT, GT), (GT, TFT), (GT, GT)] and also varying the noise parameter (0, 0.025, … , 0.5) for each of two noise conditions: where both players made equally noisy decisions, and where only the opponent made noisy decisions while the player under study strictly followed a deterministic strategy. We ran 25 replicates of each of the 168 experimental conditions, with 4,000 iterations of game play for each replicate, and then applied the FSM estimation method to each replicate of the simulated choice data to estimate the strategy that the agent player under study was using.
Being in state/row (e.g. 2) corresponds to the player taking action (e.g. D) in the current turn. All entries in row corresponding to the player taking action in the current period (e.g. columns 2 and 4 for D) are identifiable. Entries in row that correspond to not taking action in the current period (e.g. columns 1 and 3 for row 2) represent transitions that cannot occur in strictly deterministic play, so their values cannot affect play and thus cannot be determined empirically. We take this into account when testing the method’s ability to estimate underlying deterministic models: this is why only 6 elements of a 10element TFT or GT matrix can be identified (Fig. 5). We also take this into account when estimating models from empirical data, where the datagenerating process is assumed to be stochastic: each element of the matrix that would be inaccessible under deterministic play is identified, and the fitness is calculated with a strategy matrix in which that element is changed to its complement (“flipped”). If flipping the element does not change the fitness, then the two complementary strategies are indistinguishable and the element in question cannot be determined empirically. If each element decreases the fitness when it is flipped, then the strategy corresponds to a deterministic approximation of a stochastic process and all of the elements of the state matrix can be identified.
When the noise parameter was zero, most of the models estimated by the GA had at least two incorrect elements. However, for moderate amounts of noise (–), all of the models estimated by the GA were correct (see Fig. 4(a)). For noise levels above in the player, the amount of error rose rapidly with , as expected because at the action the player chooses moves completely at random so there is no strategy to discover. When a strictly deterministic player faced a noisy opponent, the GA correctly identified the player’s strategy for all noise levels above (see Fig. 4(b)).
6 Observational Data
In order to extend this method to more complex situations the predictor variables (columns of the state matrices) can include any timevarying variable relevant to an agent’s decision. In contextfree games such as the IPD, the only predictor variables are the moves the players made in the previous turn, but models of strategic interactions in contextrich environments may include other relevant variables.
We find it difficult to interpret graphical models with more than four predictors, but an analyst who had many potentially relevant predictor variables and was unable to use theory alone to reduce the number of predictors sufficiently to generate easily interpretable models with our method could take four courses of action (listed in order of increasing reliability and computation time):

Before FSM estimation, apply a (multivariate or univariate) predictor variable selection method.

Before FSM estimation, estimate an arbitrary predictive model that can produce variable importance rankings and then use the top predictors for FSM estimation.

After FSM estimation with predictors, inspect the returned predictor variable importance ranking, and remove all but the top from her dataset and rerun estimation.

Conduct FSM estimation with all combinations of predictors out of all relevant predictors and choose the estimated model with the best performance (usually highest outofsample accuracy).
We illustrate the use of extra predictor variables by applying our method to an example from international relations involving repeated water managementrelated interactions between countries that share rivers. We use data compiled by Brochmann [2012] on treaty signing and cooperation over water quality, water quantity, and flood control from 1948–1999 to generate a model for predicting whether two countries will cooperate. We used three lagged variables: whether there was waterrelated conflict between them in the previous year, whether they cooperated around water in the previous year, and whether they had signed a waterrelated treaty during any previous year. This data set was too small to divide into training and holdout subsets for assessing predictive accuracy, so we report models’ accuracy in reproducing the training data (a random choice model is 50% accurate). A twostate decision model (Fig. 5(a)) is 73% accurate, a threestate model (Fig. 5(c)) is 78% accurate, and a fourstate model is 82% accurate, but its complexity makes it difficult to interpret visually so it is not shown.
Accuracy can be a problematic measure when the classes are imbalanced, i.e. if a class the model is trying to predict is rare. Many alternatives to accuracy are available that illuminate different aspects of predictive power. For instance, precision is the proportion of (cooperation) event signals predicted by our models that are correct and recall is the proportion of events that are predicted by our models. For this subset of the dataset, cooperate and not cooperate were almost evenly distributed and to maintain a comparison to the experimental and simulated data we used accuracy as the fitness measure.

In the twostate model, whether or not the countries cooperated in the previous year, the combination of conflict and treatysigning in the previous year always produces cooperation, whereas conflict without treatysigning in the previous year always produces noncooperation. In the threestate model, three of the four outcomes that include conflict lead to a transition from noncooperation to cooperation, and four of the six outcomes that cause transitions from cooperation (states and ) to noncooperation are nonconflict outcomes. While this does not tell us something decisive about the role of conflict, it suggests that there may be a counterintuitive role of conflict in promoting cooperation. Brochmann [2012], using a bivariate probit simultaneous equation model, has a similar finding: “In the aftermath of conflict, states may be particularly eager to solve important issues that could cause future problems” (p. 159).
7 Discussion
This paper outlined a method for estimating interpretable models of dynamic decisionmaking. By estimating a global, deterministic, simple function for a given dataset, imposing constraints on the number of predictor variables, and providing options for reducing the number of predictor variables, our process facilitates capturing a significant amount of information in a compact and useful form. The method can be used for designing empirically grounded agent models in agentbased simulations and for gaining direct insight into observed behaviors of real agents in social and physical systems. Combining state matrices and a genetic algorithm has proven effective for simulated data, experimental game data, and observational international relations data. With the simulated data, we successfully recovered the exact underlying models that generated the data. With the real data, we estimated simple deterministic approximations that explain most of the structure of the unknown underlying process. We discovered a theoretically interesting dynamic decision model that predicted IPD play better than all of the existing theoretical models of IPD play that we were aware of.
We have released an opensource R package that implements the methods described here to estimate any time series classification model that uses a small number of binary predictor variables and moves back and forth between the values of the outcome variable over time. Larger sets of predictor variables can be reduced to smaller sets by applying one of the four methods outlined in Section 6. Although the predictor variables must be binary, a quantitative variable can be converted into binary by division of the observed values into high/low classes. Future releases of the package may include additional estimation methods to complement GA optimization.
Acknowledgments
We gratefully acknowledge the authors of R [R Core Team, 2014]. This manuscript was prepared using knitr [Xie, 2014]. We would like to thank Yevgeniy Vorobeychik for discussions on predicting games.
This work was supported by U.S. National Science Foundation grants EAR1416964 and EAR1204685.
References
 Arifovic [1994] Jasmina Arifovic. Genetic algorithm learning and the cobweb model. Journal of Economic Dynamics and Control, 18:3–28, 1994.
 Arifovic and Eaton [1995] Jasmina Arifovic and Curtis Eaton. Coordination via genetic learning. Computational Economics, 8:181–203, 1995. doi: 10.1007/BF01298459.
 Axelrod [1984] Robert M. Axelrod. The Evolution of Cooperation. Basic Books, New York, 1984.
 Axelrod [1997] Robert M. Axelrod. The Complexity of Cooperation: Agentbased Models of Competition and Collaboration. Princeton University Press, Princeton, 1997.
 BerebyMeyer and Roth [2006] Yoella BerebyMeyer and Alvin E. Roth. The speed of learning in noisy games: Partial reinforcement and the sustainability of cooperation. The American Economic Review, 96:1029–1042, 2006.
 Brochmann [2012] Marit Brochmann. Signing river treaties: Does it improve river cooperation? International Interactions, 38:141–163, 2012. doi: 10.1080/03050629.2012.657575.
 Bullard and Duffy [1999] James Bullard and John Duffy. Using genetic algorithms to model the evolution of heterogeneous beliefs. Computational Economics, 13:41–60, 1999. doi: 10.1023/A:1008610307810.
 Camerer [2003] Colin F. Camerer. Behavioral Game Theory: Experiments in Strategic Interaction. Princeton University Press, Princeton, 2003.

Chong and Yao [2005]
S.Y. Chong and Xin Yao.
Behavioral diversity, choices and noise in the iterated prisoner’s
dilemma.
IEEE Transactions on Evolutionary Computation
, 9:540–551, 2005.  Dal Bo and Frechette [2011] Pedro Dal Bo and Guillaume R Frechette. The evolution of cooperation in infinitely repeated games: Experimental evidence. American Economic Review, 101:411–429, 2011. doi: 10.1257/aer.101.1.411.
 Duffy [2006] John Duffy. Agentbased models and human subject experiments. In Handbook of Computational Economics, volume 2, pages 949–1011. Elsevier, Amsterdam, 2006.
 Duffy and EngleWarnick [2002] John Duffy and Jim EngleWarnick. Using symbolic regression to infer strategies from experimental data. In Evolutionary Computation in Economics and Finance, pages 61–82. Springer, New York, 2002.
 Duffy and Ochs [2009] John Duffy and Jack Ochs. Cooperative behavior and the frequency of social interaction. Games and Economic Behavior, 66:785–812, 2009. doi: 10.1016/j.geb.2008.07.003.
 Eddelbuettel [2013] Dirk Eddelbuettel. Seamless R and C++ Integration with Rcpp. Springer, New York, 2013.
 Fogel [1993] David B. Fogel. Evolving behaviors in the iterated prisoner’s dilemma. Evolutionary Computation, 1:77–97, 1993.
 Fudenberg et al. [2012] Drew Fudenberg, David G Rand, and Anna Dreber. Slow to anger and fast to forgive: Cooperation in an uncertain world. American Economic Review, 102:720–749, 2012. doi: 10.1257/aer.102.2.720.

Goldberg and Holland [1988]
David E. Goldberg and John H. Holland.
Genetic algorithms and machine learning.
Machine Learning, 3:95–99, 1988. doi: 10.1023/A:1022602019183.  Hanaki [2004] Nobuyuki Hanaki. Action learning versus strategy learning. Complexity, 9:41–50, 2004.
 Hanaki et al. [2005] Nobuyuki Hanaki, Rajiv Sethi, Ido Erev, and Alexander Peterhansl. Learning strategies. Journal of Economic Behavior & Organization, 56:523–542, 2005. doi: 10.1016/j.jebo.2003.12.004.
 Hardin [1968] Garrett Hardin. The tragedy of the commons. Science, 162:1243–1248, 1968. doi: DOI:10.1126/science.162.3859.1243.
 Hastie et al. [2009] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. Springer, New York, NY, 2nd edition, 2009. ISBN 9780387848570.
 Koza [1992] John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Bradford, Cambridge, MA, 1992.
 Kunreuther et al. [2009] Howard Kunreuther, Gabriel Silvasi, Eric T. Bradlow, and Dylan Small. Bayesian analysis of deterministic and stochastic prisoner’s dilemma games. Judgment and Decision Making, 4:363–384, 2009.
 Marks et al. [1995] Robert E. Marks, David F. Midgley, and Lee G. Cooper. Adaptive behaviour in an oligopoly. In Jörg Biethahn and Volker Nissen, editors, Evolutionary Algorithms in Management Applications, pages 225–239. Springer, New York, 1995.
 McKelvey and Palfrey [2001] Richard D. McKelvey and Thomas R. Palfrey. Playing in the dark: Information, learning, and coordination in repeated games. Technical report, California Institute of Technology, Pasadena, 2001.
 Midgley et al. [1997] David F. Midgley, Robert E. Marks, and Lee C. Cooper. Breeding competitive strategies. Management Science, 43:257–275, 1997. doi: 10.1287/mnsc.43.3.257.
 Miller [1996] John H. Miller. The coevolution of automata in the repeated prisoner’s dilemma. Journal of Economic Behavior & Organization, 29:87–112, 1996.
 Miller and Page [2007] John H. Miller and Scott E. Page. Complex Adaptive Systems: An Introduction to Computational Models of Social Life. Princeton University Press, Princeton, 2007.
 Moore [1956] Edward Moore. Gedankenexperiments on sequential machines. Automata Studies, 34:129–153, 1956.
 Nay [2014] John Jacob Nay. Predicting cooperation and designing institutions: An integration of behavioral data, machine learning, and simulation. In Winter Simulation Conference Proceedings, Savannah, GA, December 2014.
 Osborne and Rubinstein [1994] Martin J. Osborne and Ariel Rubinstein. A Course in Game Theory. MIT Press, Cambridge, MA, 1994.
 R Core Team [2014] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2014. URL http://www.Rproject.org/.
 Rubinstein [1986] Ariel Rubinstein. Finite automata play the repeated prisoner’s dilemma. Journal of Economic Theory, 39(1):83–96, June 1986. ISSN 00220531. doi: 10.1016/00220531(86)900219. URL http://www.sciencedirect.com/science/article/pii/0022053186900219.
 Savage [1997] C. Savage. A Survey of Combinatorial Gray Codes. SIAM Review, 39(4):605–629, January 1997. ISSN 00361445. doi: 10.1137/S0036144595295272. URL http://epubs.siam.org/doi/abs/10.1137/S0036144595295272.
 Scrucca [2013] Luca Scrucca. GA: A package for genetic algorithms in R. Journal of Statistical Software, 53:1–37, 2013. URL http://www.jstatsoft.org/v53/i04/.
 Xie [2014] Yihui Xie. Dynamic Documents with R and knitr. Chapman & Hall/CRC, Boca Raton, 2014.
Comments
There are no comments yet.