Mining Determinism in Human Strategic Behavior

11/11/2012 ∙ by Rustam Tagiew, et al. ∙ TU Bergakademie Freiberg 0

This work lies in the fusion of experimental economics and data mining. It continues author's previous work on mining behaviour rules of human subjects from experimental data, where game-theoretic predictions partially fail to work. Game-theoretic predictions aka equilibria only tend to success with experienced subjects on specific games, what is rarely given. Apart from game theory, contemporary experimental economics offers a number of alternative models. In relevant literature, these models are always biased by psychological and near-psychological theories and are claimed to be proven by the data. This work introduces a data mining approach to the problem without using vast psychological background. Apart from determinism, no other biases are regarded. Two datasets from different human subject experiments are taken for evaluation. The first one is a repeated mixed strategy zero sum game and the second - repeated ultimatum game. As result, the way of mining deterministic regularities in human strategic behaviour is described and evaluated. As future work, the design of a new representation formalism is discussed.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Game theory is one of many scientific disciplines predicting outcomes of social, economical and competitive interactions among humans on the granularity level of individual decisions [1, p.4]

. People are assumed to be autonomous and intelligent, and to decide according to their preferences. People can be regarded as rational, if they always make decisions, whose execution has according to their subjective estimation the most preferred consequences

[2, 3]. The correctness of subjective estimation depends on the level of intelligence. Rationality can justify own decisions and predictions of other people’s decisions. If interacting people satisfy the concept of rationality and apply mutually and even recursively this concept, then the interaction is called strategic interaction (SI). Further, game is a notion for the formal structure of a concrete SI [4]. A definition of a game consists of a number of players, their legal actions and players’ preferences. The preferences can be replaced by a payoff function under assumed payoff maximization. The payoff function defines each player’s outcome depending on his actions, other players’ actions and random events in the environment. The game-theoretic solution of a game is a prediction about the behavior of the players aka an equilibrium. The assumption of rationality is the basis for an equilibrium. Deviating from an equilibrium is beyond rationality, because it does not maximize the payoff. Not every game has an equilibrium. However, there is at least one mixed strategies equilibrium (MSE) in finite games [5].
The notion of game is commonly used for pleasant time spending activities like board games, but can also be extended to all social, economical and competitive interactions among humans. A board game can have the same game structure as a war. Some board games are even developed to train people, like Prussian army war game ”Kriegspiel Chess” [6] for officers. We like it to train ourselves in order to perform better in games [7]. In most cases, common human behavior in games deviates from game-theoretic predictions [8, 9]. One can say without any doubt that if a human player is trained in a concrete game, he will perform close to equilibrium. But, a chess master does not also play poker perfectly and vice versa. On the other side, a game-theorist can find a way to compute an equilibrium for a game, but it does not make a successful player out of him. There are many games we can play; for most of them, we are not trained. That is why it is more important to investigate our behavior while playing general games than playing a concrete game on expert level. Conducting experiments for gathering data of human game playing is called experimental economics.
Although general human preferences are a subject of philosophical discussions [10], game theory assumes that they can be captured as required for modeling rationality. Regarding people as rational agents is disputed at least in psychology, where even a scientifically accessible argumentation exposes the existence of stable and consistent human preferences as a myth [11]. The problems of human rationality can not be explained by bounded cognitive abilities only. ”British people argue that it is worth spending billions of pounds to improve the safety of the rail system. However, the same people habitually travel by car rather than by train, even though traveling by car is approximately 30 times more dangerous than by train!”[12, p.527–530] Since the last six decades nevertheless, the common scientific standards for econometric experiments are that subjects’ preferences over outcomes can be insured by paying differing amounts of money [13]. However, insuring preferences by money is criticized by the term ”Homo Economicus” as well.
The ability of modeling other people’s rationality and reasoning as well corresponds with the psychological term ”Theory of Mind” (ToM) [14], which lacks almost only in the cases of autism. For experimental economics, subjects as well as researchers, who both are supposed to be non-autistic people, may fail in modeling of others’ minds anyway. In Wason task at least, subjects’ reasoning does not match the researchers’ one [15]. Human rationality is not restricted to capability for science-grade logical reasoning – rational people may use no logic at all [16]

. However, people also mistake seriously in the calculus of probabilities

[17]. In mixed strategy games, the required sequence of random decisions can not be properly generated by people [18]. Due to bounded cognitive abilities, every ”random” decision depends on previous ones and is predictable in this way. In ultimatum games [9, S. 43ff], the economists’ misconception of human preferences is revealed – people’s minds value fairness additionally to personal enrichment. Our minds originated from the time, when private property had not been invented and social values like fairness were essential for survival.
This work concentrates on human playing of general games and continues author’s previous work [19]. It is about the common human deviations from predicted equilibria in games, for which they are not trained or experienced. The two examples introduced in this work are a repeated mixed strategy zero sum game and a repeated ultimatum game from responders’ perspective. The only assumption is the existence of deterministic rules in human behavior. Under this assumption, diverse data mining algorithms are evaluated. Apart from mining deterministic regularities, modeling human behavior in general games needs a representation formalism which is not specific to a concrete game. Representing human behavior models in such a formalism would increase their comparability. Therefore, this paper includes a general formalism discussion, where results from the evaluation are involved.
The next section summarizes related work on a formalism for human behavior in games. Then, the data mining approach on datasets is presented afterwards. Summary and discussion conclude this paper.

2 Related Work

A very comprehensive gathering of works in experimental psychology and economics on human behavior in general games can be found in [9]. This work inspired research in artificial intelligence [20], which led to the creation of network of influence diagrams (NID) as a representation formalism. NID is a formalism similar to the possible worlds semantics of Kripke models [21]

and is a super-set of Bayesian games. The main idea of NID is modeling human reasoning patterns in diverse SIs. Every node of a NID is a multi-agent influence diagram (MAID) representing a model of SI of an agent. MAID is an influence diagram (ID), where every decision node is associated with an agent. ID is a Bayesian network (BN), where one has ordinary nodes, decision nodes and utility nodes. In summary, this approach assumes that human decision making can be modeled using BN – human reasoning is assumed to have a non-deterministic structure. This formalism is already applied for modeling reciprocity in a repeated ultimatum game called ”Colored Trails” (CT)

[22]. The result of this work is that models of adaptation to human behavior based on BNs perform better than standard game theoretical algorithms.
Another independent work is an application of a cognitive architecture from psychology to games [23]. A cognitive architecture is a formalism concerned to represent general human reasoning [24] in order to compare different models. Today’s most popular cognitive architecture is ACT-R (Adaptive Control of Thought — Rational) [25]

. In comparison to NID, ACT-R is used for a number of psychological studies. ACT-R consists of two tiers – symbolic and sub-symbolic. On the symbolic tier, there are chunks – facts and ”If-Then”-rules. On the sub-symbolic tier, there are exponential functions, which determine activation levels of chunks, delays in reasoning and priorities between rules. Based on ACT-R, an almost deterministic model for a mixed strategy zero sum game ”Rock Paper Scissors” (RPS) is designed. The only case, in which the designed model predicts random behavior is the beginning of a game sequence. The model was successfully evaluated as a base for an artificial player, which won against human subjects.


Whether deterministic or not, both works follow the same approach. First, they construct a model, which is based on theoretical considerations. Second, they adjust the parameters of this model to the experimental data. This makes the human behavior explainable using the concepts from the model, but needs a priori knowledge to construct the model.

3 Used Datasets

The first dataset chosen for our data mining approach has already been mentioned in our previous work [19]. It is the game RPS played over a computer network. This game is easy to explain and most people do not train to play it on expert level; it is symmetric, zero sum and two player. The study was conducted on threads of one-shot games. A player had a delay for consideration of sec for every shot. If he did not react, the last or default gesture was chosen. A thread lasted

. This game has one mixed strategy equilibrium (MSE), which is an equal probability distribution between the three gestures. At least, one can not lose playing this MSE.


Ten computer science undergraduates were recruited. They were in average years old and of them were male. They had to play the thread twice against another test person. Between the two threads, they played other games. In this way, one-shot games or single human decisions are gathered. Every person got € 0.02 for a won one-shot game and € 0.01 for a draw. The persons, who played against each other, sat in two separate rooms. One of the players used a cyber-glove and the other one a mouse as input for gestures. The graphical user interface showed the following information - own last and actual choice, opponents last choice, a timer and already gained money. According to statements of the persons, they had no problems to understand the game rules and to choose a gesture timely. All winners and % of losers attested that they had fun to play the game.
The second dataset is the recorded responder behavior from the CT experiment [22]. This dataset contains single human decisions of participating subjects. A positive decision of the responder updates the monetary payoff of both players, while a negative one does not change anything. The payoff update varied between $1.45 and $-1.35 for the responder. In cases, responders update was zero. The equilibrium for the responder is to accept only proposals, which increase his payoff regardless of the proposer’s payoff.

4 Methods

Statistical analysis of the datasets from the previous chapter exposed that the human behavior observed in the experiments can not be explained using only game theory [1]. The shape of equilibrium deviations confirms the one reported in relevant literature [9]. The goal is to find a model beyond game theory for the prediction of average deviations. In related work, the creation of a sophisticated model preceded the evaluation on the data. In this work, the evaluation on the data precedes the creation of a model. Of course, some people would not match into such a model like trained or somehow experienced individuals. Prediction of specific individuals is not addressed in this paper.
Machine done prediction without participation in game playing with human subjects should not be confused with prediction algorithms of artificial players. Quite the contrary, artificial players can manipulate the predictability of human subjects by own behavior. For instance, an artificial player, which always throws ”Paper” in RPS, would success at predicting a human opponent always throwing ”Scissors” in reaction. Otherwise, if an artificial player maximizes its payoff based on opponent modeling, it would face a change in human behavior and have to handle that. This case is more complex than a spectator prediction model for an ”only-humans” interaction. This paper restricts on a prediction model without participating.
Human behavior can be modeled as either deterministic or non-deterministic. Although human subjects fail at generating truly random sequences as demanded by MSE, non-deterministic models are especially used in case of artificial players in order to handle uncertainties. ”Specifically, people are poor at being random and poor at learning optimal move probabilities because they are instead trying to detect and exploit sequential dependencies. … After all, even if people do not process game information in the manner suggested by the game theory player model, it may still be the case that across time and across individuals, human game playing can legitimately be viewed as (pseudo) randomly emitting moves according to certain probabilities.” [23] In the addressed case of spectator prediction models, non-deterministic view can be regarded as too shallow, because deterministic models allow much more exact predictions. Non-deterministic models are only useful in cases, where a proper clarification of uncertainties is either impossible or costly. To remind, deterministic models should not be considered to obligatory have a formal logic shape.
The deterministic function denotes a human decision. denotes the previous turns in the game. and are the input and

– the output. Finding a hypothesis, which matches the regularity between input and output without a priori knowledge, is a typical problem called supervised learning

[26]. There is already a big amount of algorithms for supervised learning. Each algorithm has its own hypothesis space (HS

). For a Bayesian learner, e.g., the hypothesis space is the set of all possible Bayesian networks. There are many different types of hypothesis spaces - rules, decision trees, Bayesian models, functions and so on. Concrete hypothesis

is a relationship between input and output described by using the formal means of the corresponding hypothesis space.

Which hypothesis space is most appropriate to contain valid hypotheses about human behavior? This is a machine learning version of the question about a formalism for human behavior. The most appropriate hypothesis space contains the most correct hypothesis for every concrete example of human behavior. A correct hypothesis does not only perform well on the given data (training set), but it performs also well on new data (test set). Further, it can be assumed that the algorithms which choose a hypothesis perform alike well for all hypothesis spaces. For instance, a decision tree algorithm creates a tree, a neuronal algorithm creates a neuronal network and the distance between the created tree to the best possible tree is the same as the distance between the created neuronal network and the best possible neuronal network. This assumption is a useful simplification of the problem for a preliminary demonstration. Using it, one can consider the algorithm with the best performance on the given data as the algorithm with the most appropriate hypothesis space. The standard method for measurement of performance of a machine learning algorithm or also a classifier is cross validation.


As it is already mentioned, a machine learning algorithm has to find hypothesis which matches best the real human behavior function HD. Human decision making depends mostly on a small part of the history due to bounded resources. This means that one needs a simplification function . Using function S the function is to be approximated through . The problem for finding the most appropriate hypothesis can be formulated in equation 1. The function match in equation 1 is considered to be implemented through a cross validation run.

(1)

5 Empirical Results

The first dataset is transformed to a sets of tuples, each one consists of three own previous gestures, three opponent previous gestures and own next gesture. Therefore, every tuple has the length . The simplification function is a window over three last turns. There are possible tuples for RPS. The decisions in the first three turns of game are not considered. Therefore, the size of the set results to tuples. The second dataset is also transformed to a sets of tuples, where every tuple includes the proposers payoff update, the responders payoff update and the responders boolean reply.
Implementations of classifiers provided by WEKA [27] are used for the cross validation on the both sets of tuples. For the first dataset, there are currently classifiers available in the WEKA library, which can handle multi-valued nominal classes. Gestures in RPS are nominal, because there is no order between them. These classifiers belong to different groups - rule-based, decision trees, function approximators, baysian learners, instance-based and miscellaneous. A cross validation of all classifiers on RPS dataset is performed. For the CT dataset, a cross-validation of appropriate classifiers is performed. The number of subsets for cross-validation is . Both cross-validation runs are conducted with preserving order of the tuples.
Sequential minimal optimization (SMO) [28] showed % prediction correctness, which is about % higher than the sophisticated non-deterministic model for RPS of Warglen [29]. Unfortunately, decreasing and increasing the window size in the function S for the RPS dataset diminishes the performance. Using the single rule classifier (OneR), one can find out that % of the RPS dataset matches the rule: ”Choose paper after rock, scissors after paper and rock after scissors”. A number of classifiers including SMO achieve % correctness on the CT dataset in cross-validation. One of this algorithms is based on decission tables [30]. This algorithm finds out that % of the CT dataset conforms the rule: ”If an acceptance does neither change your payoff nor improve the proposers payoff, then refuse!” This result overperforms clearly the % reported from the non-deterministic approach of Pfeffer [22].

6 Conclusion

The strategic behavior consists out of the observable actions, whose origins are tried to be understood as generally as possible. Summarizing the results of this work, it can be said that SMO can find the most general deterministic hypothesis about regularities of human behavior in the investigated scenarios. The correctness of such a hypothesis overperforms the numbers reported in related work. The hypothesis space of SMO is one of complex functions and can be used for the design of a game behavior description formalism.

References

  • [1] Tagiew, R.: Strategische Interaktion realer Agenten: Ganzheitliche Konzeptualisierung und Softwarekomponenten einer interdisziplinären Forschungsinfrastruktur. PhD thesis, TU Bergakademie Freiberg (2011)
  • [2] Russel, S., Norvig, P.: Artificial Intelligence. Pearson Education (2003)
  • [3] Osborne, M.J., Rubinstein, A.: A course in game theory. MIT Press (1994)
  • [4] Morgenstern, O., von Neumann, J.: Theory of Games and Economic Behavior. Princeton University Press (1944)
  • [5] Nash, J.: Non-cooperative games. Annals of Mathematics (54) (1951) 286 – 295
  • [6] Li, D.H.: Kriegspiel: Chess Under Uncertainty. Premier (1994)
  • [7] Genesereth, M.R., Love, N., Pell, B.: General game playing: Overview of the aaai competition. AI Magazine 26(2) (2005) 62–72
  • [8] Pool, R.: Putting game theory to the test. Science 267 (1995) 1591–1593
  • [9] Camerer, C.F.: Behavioral Game Theory. Princeton University Press (2003)
  • [10] Stevenson, L., Haberman, D.L.: Ten Theories of Human Nature. OUP USA (2004)
  • [11] Bazerman, M.H., Malhotra, D.: Economics wins, psychology loses, and society pays. In De Cremer, D., Zeelenberg, M., Murnighan, J.K., eds.: Social Psychology and Economics. Lawrence Erlbaum Associates (2006) 263–280
  • [12] Eysenck, M.W., Keane, M.T.: Cognitive Psychology: A Student’s Handbook. Psychology Press (2005)
  • [13] Chamberlin, E.H.: An experimental imperfect market. Journal of Political Economy 56 (1948) 95–108
  • [14] Verbrugge, R., Mol, L.: Learning to apply theory of mind. Journal of Logic, Language and Information 17 (2008) 489–511
  • [15] Wason, P.C.: Reasoning. In Foss, B.M., ed.: New horizons in psychology. Penguin Books (1966) 135–151
  • [16] Oaksford, M., Chater, N.: The probabilistic approach to human reasoning. Trends in Cognitive Sciences 5 (2001) 349–357
  • [17] Kahneman, D., Slovic, P., Tversky, A.:

    Judgment Under Uncertainty: Heuristics and Biases.

    Cambridge University Press (1982)
  • [18] Kareev, Y.: Not that bad after all: Generation of random sequences. Journal of Experimental Psychology: Human Percetion and Performance 18 (1992) 1189–1194
  • [19] Tagiew, R.: Hypotheses about typical general human strategic behavior in a concrete case. In: AI*IA, Springer (2009) 476–485
  • [20] Gal, Y., Pfeffer, A.: A language for modeling agents’ decision making processes in games. In: AAMAS, ACM Press (2003) 265–272
  • [21] Kripke, S.: Semantical considerations on modal logic. Acta Philosophica Fennica 16 (1963) 83–94
  • [22] Gal, Y., Pfeffer, A.: Modeling reciprocal behavior in human bilateral negotiation. In: AAAI, AAAI Press (2007) 815–820
  • [23] Rutledge-Taylor, M.F., West, R.L.: Cognitive modeling versus game theory: Why cognition matters. In: ICCM. (2004) 255–260
  • [24] Gluck, K.A., Pew, R.W., Young, M.J.: Background, structure, and preview of the model comparison. In Gluck, K.A., Pew, R.W., eds.: Modeling Human Behavior with Integrated Cognitive Architectures. Lawrence Erlbaum Associates (2005) 3–12
  • [25] Taatgen, N., Lebiere, C., Anderson, J.: Modeling paradigms in act-r. 29–52
  • [26] Mitchell, T.M.: Machine Learning. McGraw-Hill Higher Education (1997)
  • [27] Witten, I.H., Frank, E.: Data Mining. Morgan Kaufmann (2005)
  • [28] Platt, J.C.:

    Fast training of support vector machines using sequential minimal optimization.

    In: Advances in Kernel Methods - Support Vector Learning, MIT Press (1999) 185–208
  • [29] Marchiori, D., Warglien, M.:

    Predicting human interactive learning by regret-driven neural networks.

    Science 319 (2008) 1111–1113
  • [30] Kohavi, R.: The power of decision tables. In: ECML, Springer (1995) 174–189