A Robot that Learns Connect Four Using Game Theory and Demonstrations

by   Ali Ayub, et al.
Penn State University

Teaching robots new skills using minimal time and effort has long been a goal of artificial intelligence. This paper investigates the use of game theoretic representations to represent and learn how to play interactive games such as Connect Four. We combine aspects of learning by demonstration, active learning, and game theory allowing a robot to learn by presenting its understanding of the structure of the game and conducting a question/answer session with a person. The paper demonstrates how a robot can be taught the win conditions of the game Connect Four and its variants using a single demonstration and a few trial examples with a question and answer session led by the robot. Our results show that the robot can learn any arbitrary win conditions for the Connect Four game without any prior knowledge of the win conditions and then play the game with a human utilizing the learned win conditions. Our experiments also show that some questions are more important for learning the game's win conditions.




Is that you, Dr. Falken?


page 5

page 6


A Data-Efficient Deep Learning Approach for Deployable Multimodal Social Robots

The deep supervised and reinforcement learning paradigms (among others) ...

Training Humans to Train Robots Dynamic Motor Skills

Learning from demonstration (LfD) is commonly considered to be a natural...

Training an Interactive Humanoid Robot Using Multimodal Deep Reinforcement Learning

Training robots to perceive, act and communicate using multiple modaliti...

Quantifying Teaching Behaviour in Robot Learning from Demonstration

Learning from demonstration allows for rapid deployment of robot manipul...

Video2Skill: Adapting Events in Demonstration Videos to Skills in an Environment using Cyclic MDP Homomorphisms

Humans excel at learning long-horizon tasks from demonstrations augmente...

A Game-Theoretic Analysis of the Off-Switch Game

The off-switch game is a game theoretic model of a highly intelligent ro...

Relation learning in a neurocomputational architecture supports cross-domain transfer

People readily generalise prior knowledge to novel situations and stimul...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


The objective of our larger research program is to develop the computational underpinnings and algorithms that will allow a robot to learn how to play an interactive game such as Uno, Monopoly, or Connect Four from a child. We are motivated by potential applications in hospitals and long-term care facilities for children. Moreover, playing interactive games such as these has been shown to contribute to social development [17, 12, 6]. Our intent is to create the underlying theory and algorithms that will allow a child to teach a robot to play the games that the child wants to play. These games may contain nuanced and individualized rules that change and vary with each child or game played.

We borrow computational representations from game theory to address this problem. Game theory has been used to formally represent and reason about a number of interactive games such as Snakes and Ladders, Tic-Tac-Toe, and versions of Chess [4]. Game theory offers a collection of mathematical tools and representations that typically examine questions of strategy during an interaction or series of interactions. The term game is used to describe the computational representation of an interaction or series of interactions. Game theory provides a variety of different representations, but the two most common representations are the normal-form game and the extended-form game (described in greater detail below). We use the term ”interactive game” to indicate a series of interactions that happen through a board, cards, or play style which has predefined rules, actions, winners and losers. Given this terminology, game theory provides computational representations (games) that can be used to represent interactive games.

Using representations from game theory has advantages and disadvantages. On the positive side, game theory has an extensive history representing a wide variety of interactive situations ranging from contract negotiations [16] to the evolution of bacteria [14, 5]. Moreover, game theoretic representations have been designed to capture the information needed to formally represent an interaction. Finally, representing interactions as game-theoretic games allows one to apply the tools and results from game theory as needed. For example, calculating Nash equilibrium in order to influence one’s play. On the other hand, game-theoretic representations do not always predict human behavior [10] and are not easily learned solely from data [11].

This paper focuses on developing the computational underpinnings necessary for a robot to play the interactive game Connect Four and its variants. We have chosen this game because: 1) it is physically easy for a real robot to play; 2) the rules are simple enough that a preteen child could learn or teach the game; and 3) the game is still complex with approximately 1.6 x 1013 board positions. We believe that the methods developed in this paper will also work for other games and hope to show the general applicability of these techniques in future work, although some initial progress on this topic has already been made [24, 3].

We seek to develop a system that learns how to play the game by asking people questions about the game. We assume that the robot knows what the game pieces are and how to use them. The focus of this paper is thus on the robot learning the win conditions for the game (i.e. how to win). Our approach leverages the robot’s developing representation of the game to guide active learning. Specifically, an evolving game tree indicates to the robot the questions that it must ask in order to gain enough knowledge about the structure of the game to be able play it. Often when one person teaches another person how to play a game they begin by explaining how one wins. This information is then reinforced with practice rounds of play. Our goal is to develop the computational underpinnings that will allow the robot to learn the win conditions well enough to begin playing, even if the full structure of the game has not been learned. The main contributions of this paper are:

  1. A novel approach that utilizes the evolving game-tree representation of the game Connect Four to ask questions from a user to learn the game’s win conditions.

  2. An approach that can be used to learn different win conditions pattern on the Connect Four board in addition to the four win conditions of Connect Four (column, row, diagonal, anti-diagonal).

  3. An experimental analysis that quantifies the importance of different questions for learning the win conditions on the Connect Four board.

Related Work

The field of artificial intelligence has a long history of developing systems that can play [9, 25] and learn games [15, 23]

. Recently, significant progress has been made developing systems capable of mastering games such as Chess, Poker, Shogi (Japanese chess) and Go using deep reinforcement learning techniques

[21, 20, 22, 27]. State-of-the-art methods in deep reinforcement learning have also been used to train autonomous agents to play a variety of ATARI and other games [8]. While deep reinforcement learning clearly provides a method for learning how to strategically play a game, learning requires large amounts of training data and is fundamentally non-interactive [29]. Interpersonal game learning, on the other hand, is an interactive process involving limited data and examples, and play must begin before the structure of the game is fully known in order to maintain the other person’s attention and interest. Moreover, with children in particular, rules change dynamically in order to make play more favorable and exciting for the child. Data-driven retraining may not be possible or desirable in this situation.

Deep learning-based meta-learning has been proposed as a means for managing the problem of large training time and massive data sets [29, 7, 13, 18]. Although these approaches can learn how to do a task by just watching a single or few demonstrations, the new task has to be very similar to the task that the robot was originally trained on i.e. a robot trained on picking objects will not be able to learn how to place an object. Moreover, the initial meta-learning phase to train the robot on the same task still requires a large amount of data and time. Hence, the problem of using guided interaction with a human to teach the robot a new concept remains unsolved. Lastly, to our knowledge, no meta-learning approach exists for learning interactive games by watching just a single demonstration. Although, researchers have looked at learning goal-oriented tasks using meta-learning like kitchen serving tasks [28] and visual navigation in novel scenes [26].

Active learning describes the general approach of allowing a machine learner to actively seek information from a human about particular data in order to improve performance with less training [19]

. Typically active learning is framed around a supervised learning task involving labeled and unlabeled data. There are a number of different active learning strategies, the membership query strategy being most related to our work

[1, 2]. For this active learning strategy the learner generates queries to a human focused on specific instances of data. As described below, one contribution of this paper is our use of game-theoretic representations to assist with the generation of queries directed at the human.

Using Game Theory to Represent Interactive Games

Figure 1: One stage of the extensive-form representation of the Connect Four game. The upper node shows the current game state after player 1 chose an action, the lower nodes depict game states after the seven possible actions are taken by player 2.

As depicted in our previous work [3], an interactive game in which players take alternative turns (like Connect Four) can be represented using extensive-form game format (Figure 1). In Connect Four players are required to place round game chips in a 7x6 vertical board. It is a perfect information game because at each stage both players have complete information about the state of the game, actions taken by the other player and the actions available to the other player in the next stage. In each turn, each player chooses a column and place their respective colored chips, hence in each turn a player has a maximum of seven actions available. Figure  1 shows one stage of the extensive-form representation of the game.

Images of a Connect Four game (Fig. 2 left) can be directly translated into a matrix format (Fig. 2 middle) indicating which player has pieces occupying specific positions in the matrix. The matrix format simply encodes the piece positions of the players in the Connect Four board. This matrix can be used to generate the possible extensive-form games (Fig. 2 right). The extensive-form representation can also be translated back into matrices and used to predict what different game states should look like or, as described later, presented to a person as a possible win condition for verification. The back-and-forth conversion between the extensive-form game and the matrix representation is based on the action-state relationship of the game Connect Four i.e. the column number of the chip in the board represents the type of action taken by the player. Functions to convert to and from the extensive-form game and matrix representation were pre-programmed.

Figure 2: A column win condition in column 5 for the Connect Four game seen from the robot’s perspective is shown above (left). The corresponding extensive-form representation is shown on the right. The numbers along with the arrows show the action number chosen by the players (5 by the human and ? by the robot since robot’s actions are unknown). Best viewed in color.

Learning the Win Conditions of Connect Four

A win condition is a terminal game state in which all players win or lose the game. We focus on learning these conditions because doing so is necessary for being able to play the game with purpose. For Connect Four, the rules state that selecting actions that create a pattern of four of the same colored chips in either a row, column, diagonal or anti-diagonal pattern for either player is a win. Players can also draw by filling up the game board without winning. A win condition is represented as a terminal node (a leaf) in the game tree, where one of the players wins the game. All games have some finite set of terminal nodes. The ways to win, lose or draw a game create partitions in the the set of terminal nodes based on the rules of the game.

Pre-win Condition Learning Tasks

Prior to learning a game’s win conditions, the robot first needs to know the complete structure of the game i.e. what are the possible actions available to each player at each stage of the game. To learn this from a human, the robot first asks two questions that allow it to generate a basic game structure. The two questions are: ”How many players can play this game?” and ”Is this a type of game in which players take alternative turns?” These questions allow the robot to generate a generic game tree that iterates among the different players. We believe that these questions will be necessary to learn any type of game. For Connect Four answers to the two questions are ”two” and ”yes”, respectively. The robot also needs to learn about the components of the game such as the look of the game board, the game chips and their associated colors, and how to physically perform the actions related to the game. We currently assume that this information is preprogrammed and can be loaded once the robot knows the name of the game. For Connect Four we used code available online111https://sdk.rethinkrobotics.com/wiki/Connect_Four_Demo which includes the tools for creating the requisite robot behaviors and identifying the game pieces. This preprogrammed information includes:

  • How to physcially perform all of the possible actions {}

  • How to convert a game image into the matrix format of the game state (see Figure 2).

In the future we hope to also have the robot learn this information.

Reasoning with the Game Tree to Ask the Right Questions

From the initial information the robot has the complete structure of the game in the form of an extensive-form game tree. The only thing missing from the structure are the win conditions i.e the terminal nodes in the game tree that leads to a win for a player.

To learn the win conditions of Connect Four and its variants, we use ideas from learning from demonstration and active learning. As a first step the robot asks for a single demonstration of a win condition from the human teacher by stating, ”Can you please show me a way to win?” It then waits for the person to state, ”I am done.” Next the robot converts the visual information obtained (image of the static board) into an extended-form game. For example, Figure 2 depicts the extensive-form representation of a column win in column 5. Note that this demonstration is not the actual game state as it does not depict the red player’s moves. Because the robot knows that play iterates between the two players (from the extensive-form representation of Connect Four), it marks the moves of the red player as unknown (symbolized as question marks in Figure 2).

The initial game tree that exists after the demonstration (Figure 2 right) is clearly missing information. Moreover, the initial tree assumes that player 1 (P1) makes the first move. The demonstration also only depicts a single column win, yet a column win can be achieved in any other column. In general, the demonstration shown by the human is for a single game tree branch that leads to a terminal node where P1 wins but there are a huge number of other game tree branches that lead to a column win i.e. similar terminal nodes. Asking whether each game tree branch is a win condition is not feasible. The robot thus relies on the extensive-form representation of the game to deduce the information missing from the given demonstration so that it can ask the human about the missing information with fewer questions to learn all the tree branches that could lead to a win condition (terminal node) based on the demonstration. From any given demonstration of a win condition (for example Figure 2), the following information elements are available:

  • Given Information: Winning player’s actions (for Figure 2, these actions are {5,5,5,5}), other player’s actions (optional) (not given for example in Figure 2)

  • Missing Information: Other player’s actions (missing in Figure 2), other actions by either player on the board that do not effect the win condition (missing in Figure 2)

  • Assumptions: Root of the game tree (In Figure 2, it is assumed by the robot that P1 takes the first action in the game)

Question Type Example Questions
Specific to P1 actions Confirm the total number of actions needed by P1 to win the game; Confirm if the actions for P1 can be translated in the game tree. (Definition of Translate in Table 2)
Specific to P2 actions What actions can be taken by P2 such that P1 still achieves the win conditions shown in the demonstration?
Either player’s actions What other possible actions can be taken by either player on the game board such that P1 achieves the win condition shown in the demonstration?
Table 1: The robot asks questions about the winning player’s (P1) actions, losing player’s (P2) actions and any other actions taken by the players to learn all the possible win branches that lead to the demonstrated win condition. All these questions are guided by the information elements available from the win condition demonstration and the preprogrammed knowledge about the game structure.

Based upon these information elements available from the game tree, the robot needs to learn the missing information from the demonstration, confirm the assumptions and learn general rules underlying the given information. These information elements are essentially related to the type of actions that a winning player (P1) and the losing player (P2) can take such that the tree branch leads to a win for P1. Table 1 shows the different questions that the robot needs to ask about both players’ actions to learn about the additional information elements about the demonstrated win condition. Instead of asking the questions verbally (which require a complete dialogue manager), here we present a way for the robot to leverage its ability to convert back and forth between the game state and the game tree. In a separate work, we present a dialogue manager than allows a robot to communicate with a human using verbal and visual questions to learn the win conditions of Connect Four [30].

To ask about a specific information element, the robot manipulates the game tree representation of the demonstrated win condition to represent an example situation related to the information that the robot needs to confirm. The robot then converts the manipulated game tree into the game state image and shows it to the human accompanied by a simple yes/no question to confirm whether the example game situation is a win. By getting a simple yes/no answer about the example situation, the robot gets a label from the human about all the possible game tree branches (related to the underlying information the robot wants to confirm) whether they lead to a terminal node. Table 2 shows a list of functions available to the robot to manipulate the game tree.

Function Name Meaning
Translate(,,) Change all actions of player in the by an offset such that the new actions are between (0-6).
AddAction(,) Add an action for player
RemoveAction(,) Remove an action for player
Table 2: List of functions available to the robot to manipulate the game-theoretic representation of a demonstrated win condition
Figure 3: A block diagram of our approach to learn the win conditions of the Connect Four game.

Since the robot only asks yes/no questions, it can take multiple example situations for the robot to confirm a single information element. For example, related to the demonstration shown in Figure 2, to confirm the types of actions P2 can take such that P1 still wins, the robot starts with a general question e.g. can P2 take any actions in the game tree? The answer to that is of course No because if P2 takes action 5 (choose column 5) in its first turn P1 will not achieve a column win in column 5. Hence, the robot asks further clarifying questions to confirm that P2 can take all the actions except the ones that are the same as P1’s actions (i.e. action 5) for P1 to achieve a column win. This leads to a hierarchical set of questions asked by the robot, starting with a general to more specific questions. These questions are asked in a visual manner as described above.

Figure 4: The hypothesized game tree generated after changing one action of player 1 in the game tree of Fig. 2 (left). The associated game state image is shown on the right. The matrix format is from the robot’s perspective but the game state image is for the human’s perspective. Best viewed in color.

Our overall approach for learning the game’s win conditions is depicted in Figure 3. The robot starts with a demonstration and continues to ask questions from the human until it confirms about all the information elements (Table 1) needed to be learned about the demonstrated win condition. This process can also be terminated early if the robot reaches a pre-defined number of questions limit (we set it at 15 questions per win condition for the experiments in this paper).

To show how the robot asks questions from a human, we show an example session related to one of the questions specific to P1’s actions (Confirm if actions for P1 can be translated in the game tree (Table 1). For this example, we will consider the column win demonstration shown in Figure 2. To learn this information from the human, the robot first confirms if the numerical relationship among all the P1 actions matter i.e. all the P1 actions have to be 5. Since translate operation (in Table 2) is used to change all the actions by a particular offset, a question about translation of all the actions will not be needed if any action can be taken by a player for a win. To confirm this, the robot creates the hypothetical game tree by calling functions RemoveAction(5,1) and AddAction(3,1) in a sequence to change one of the P1’s actions and then converts the manipulated game-theoretic structure to the game-state image (Figure 4). For the given demonstration, the answer to the accompanied question will be No. Hence, the robot confirms that all the actions of P1 have to be 5. Next, using the game-theoretic structure of Connect Four the robot infers that the the siblings of action 5 (columns 0-6 except 5) can also lead to a similar win i.e. P1 actions can be translated in the tree by an offset. To confirm this inference, the robot calls the function RemoveAction(5,1) four times to remove all the actions for P1 and then calls the function AddAction(6,1) four times to add four actions for P1 in column 6. The manipulated game-theoretic structure is then converted to the game-state image (Figure 5). The answer to the accompanied question with this example will be yes for the given demonstration. Hence, the robot confirms an information element about P1’s actions in two example situations. Similarly, the robot confirms about all the other question types from Table 1.

It should be noted that for board games like Connect Four, the game state can sometimes provide better representation of a win condition than the game-theoretic structure but the game-state representation is dependent upon a particular game. Furthermore, it is easier to reason from the game-theoretic structure than the game-state. Because of this inherent generality of the game-theoretic format to represent any interactive game, our learning algorithm only relies on this representation of interactive games for asking questions and learning about the win conditions. We plan to show in our future work that the same approach can be used to learn other more complex board games (like Gobblet and Quarto) as well.

Figure 5: The hypothesized game tree generated after changing all the actions of player 1 to column 6 in the game tree of Fig. 2 (left). The matrix format is from the robot’s perspective but the game state image is for the human’s perspective. The associated game state image is shown on the right. Best viewed in color.


Figure 6: Fifty different patterns that were learned by the robot as win conditions on the Connect Four board. Only the yellow chips in the patterns are parts of the win conditions, the red chips are simply to create an offset just like in case of diagonal and anti-diagonal win conditions. Best viewed in color.

To evaluate this system, we used the Baxter robot manufactured by Rethink robotics. Google’s text-to-speech API was used to communicate questions in natural language to the person. The person answered the questions by typing inputs into a computer to avoid errors induced by the speech-to-text conversion process. The experimenter served as the robot’s interactive partner for all of the experiments, unless stated otherwise.

Learning the Four Win Conditions of Connect Four

We hypothesized that the process described in the previous sections would allow the robot to learn the four Connect Four win conditions (four games pieces in a row, column, or diagonal). We tested the process by providing the robot with a single correct demonstration of one type of win condition (e.g. a column win) and a human then correctly answered the robot’s questions about the self-generated game situations (“Is this a win for yellow?”). We repeated this process for the other types of win conditions (row, diagonal and anti-diagonal). Next, the robot’s ability to use the win conditions to play the game was tested in a real game against a human opponent. We verified that the robot could correctly use the win conditions it had learned by playing 10 games against the experimenter. The robot used a depth-2 minimax strategy to play all 10 games. Out of the 10 games, the robot won 7 times, lost 1 and drew 2 times. We believe the reason it lost a game was because it used depth-2 minimax strategy which only provides the best move for the next stage of the game, not the overall optimal move. Out of the 7 wins, the robot won twice using a diagonal win, 3 times using anti-diagonal and twice using column win. The robot encountered a diagonal win in the one game it lost. For all these games the robot correctly applied the win conditions and demonstrated its ability to correctly identify if it or the person had won the game. These experiments verify that the robot could learn the win conditions from a single demonstration and by using question and answer to present the person with different game situations, ultimately arriving at a set of extensive-form games constituting a win.

Learning Variants of Connect Four

To verify that our method is not simply limited to the four win conditions prescribed by the Connect Four game (patterns of four in a row, column, diagonal or anti-diagonal) the robot’s ability to learn different patterns representing different ways to win was tested. We hypothesized that our system could learn an arbitrary pattern as a win condition and use this pattern to play a modified version of the game. To test this hypothesis, fifty different patterns were demonstrated to the robot as win conditions on the Connect Four game board (Figure 6). The experimenter then answered the corresponding questions for each of the demonstrated win conditions. Once these questions were answered, the robot’s ability to use the learned win conditions to play 10 games (for each rule, a total of 500 games) was tested. In these games, both the robot and the experimenter took random actions and all the games ended in an average of 20 turns. Since the experimenter and the robot both took random actions, instead of checking the robot’s ability to play and win using the learned win conditions we simply checked the robot’s ability to successfully recognize the learned win condition when it was reached by either the experimenter or the robot. In all 500 games, the robot was able to recognize the learned win condition which shows that the robot successfully learned each different win conditions on the Connect Four board. We have already shown in the previous experiment if the robot learns a win condition successfully, it can use the minimax strategy to play against a human user. Future user studies will evaluate how well the robot can use the win conditions it has learned to play. This experiment verified the generic ability of our approach to learn various home-made win conditions for a game as long as the structure of the game (board, game pieces, actions available to players in a turn etc.) is known. We hope to learn these elements in the future using a dialogue manager.

Importance of Different Question Types

For the three question types in Table 2, the robot asks a maximum of 11 questions to learn any win condition pattern on the Connect Four board. Among these 11 questions, a maximum of 4 questions are asked specific to P2 actions, a maximum of 4 questions are asked about P1 actions (2 for confirming minimum number of actions required for a win and 2 for confirming the translation of P1 actions in the tree) and a maximum of 3 questions are asked about other actions taken by either player in the game. We conducted a final experiment to evaluate the importance of each question type for learning the four win conditions of Connect Four.

Hypothesis: All three question types are required to learn all the win conditions of Connect Four.

Experimental Setup: The robot learned the four win conditions of Connect Four in different interactions with one of the question types removed during each interaction. For the questions specific to P1 actions, we further divided them into two groups: to confirm minimum number of actions required for a win and translation of P1 actions. Hence, the robot was taught each win condition in four different interactions and in each interaction one of the question types was not confirmed by the robot (a total of 4*4=16 interactions). After learning each win condition in an interaction, the robot played a total of 30 games with a simulated opponent (total 4*4*30 = 480 games). Both robot and the opponent took random actions in their turns.

Evaluation: Since both players took random actions, for each of the games the robot’s ability to detect the correct win condition was tested. Table 3

shows the robot’s ability to detect each win condition after removing different question types from the interaction. It is clear that the most important questions are related to the P1 actions for all the win conditions. The effect of P2’s actions on the win condition learning is also quite drastic. For other actions taken by either player, column win is least affected by that (probably because of its simplicity) but all the other win conditions are affected by a significant margin. These results confirm our hypothesis i.e. all question types are necessary for the robot to learn all the win conditions on the Connect Four board but questions specific to P1 actions are the most important.


In this paper we have shown how game-theoretic representations of interactive games can be utilized as a means for learning the win conditions of the games. We have presented a preliminary method for using a game tree to generate hypothetical game situations that are then presented to a person in order to learn about the game. This paper presents experiments showing that a single demonstration accompanied with a few directed questions and answers can be used to learn arbitrary win conditions for the game Connect Four. We believe that the proposed approach can also be used to learn other games and possibly as a general means for representing interactions between a human and a robot. Ultimately, we believe that this avenue of research may offer a means for a robot to structure its interactions with a person, allowing the robot to bootstrap an interactive exchange by using similar experiences represented as an extended-form game as a model for other upcoming interactions.

Question Type Row Column Diagonal Anti-diagonal
Min. number of actions for a win 0% 0% 0% 0%
Translation of P1 actions 0% 0% 0% 0%
Effect of P2 actions 13.34% 16.67% 10% 10%
Either Player’s Actions 26.67% 90% 20% 16.67%
Table 3: Detection accuracy (%) of the robot after removing different question types (from Table 1) for the four win conditions of Connect Four

The problem of learning games by interactions with humans is far from solved and the current approach has some limitations. We have assumed that the person demonstrates a valid win condition and that they correctly answer the questions posed by the robot. Our experiments have also investigated whether or not some questions matter more than others in terms of learning a game’s win conditions. Our results show that, indeed, some questions and answers impact the robot’s ability to later play a game more than others. As a result it may be valuable for the robot to learn the value of different questions so that it can ask the more important questions earlier during an interaction.

This paper suggests several interesting avenues for novel research. Perhaps the most obvious is to extend this work to verbal dialog between a human and the robot. It may be possible to use the game tree to ground open ended answers by the human. This work could also be extended to more completely learn the other aspects of playing a game such as how to perform game actions or use the game components (board, tokens). One goal of this work is to create a complete system that will allow the robot to learn the complete structure of games. A final avenue of novel research will be to examine how the rules learned in this game can be transferred to other games. Considering, for example card games, one might use this process to look at different variants of poker or other games. In this case, learning by demonstration could perhaps be used to bootstrap the learning of new games from previously learned ones. Ultimately, we believe that the proposed techniques take us a step closer to robots that can learn to interact across a wide variety of situations.


This work was funded in part by Penn State’s Teaching and Learning with Technology (TLT) Fellowship, and an award from Penn State’s Institute for CyberScience.


  • [1] D. Angluin (1988-04-01) Queries and concept learning. Machine Learning 2 (4), pp. 319–342. Cited by: Related Work.
  • [2] D. Angluin (2001) Queries revisited. In Proceedings of the 12th International Conference on Algorithmic Learning Theory, ALT ’01, London, UK, UK, pp. 12–31. Cited by: Related Work.
  • [3] A. Ayub and A. R. Wagner (2018) Learning to win games in a few examples: using game-theory and demonstrations to learn the win conditions of a connect four game. In Social Robotics, pp. 349–358. Cited by: Introduction, Using Game Theory to Represent Interactive Games.
  • [4] Berlekamp,E., Conway,J. H., and Guy,R. (1982) Winning ways for your mathematical plays: games in general. Academic Press. Cited by: Introduction.
  • [5] S. Bhattacharya and G. Srivastava (2013) GAME of coordination for bacterial pattern formation: a finite automata modelling. International Journal of Mathematical Modelling and Computations 3 (4 (FALL)), pp. 299–316. Cited by: Introduction.
  • [6] D. Buchsbaum, S. Bridgers, D. S. Weisberg, and A. Gopnik (2012) The power of possibility: causal learning, counterfactual reasoning, and pretend play. Philosophical Transactions of the Royal Society B:Biological Sciences 367(1599), pp. 2202–2212. Cited by: Introduction.
  • [7] Y. Cheng, Mo. Yu, X. Guo, and Zhou,B. (2019) Few-shot learning with meta metric learners. arXiv:1901.09890. Cited by: Related Work.
  • [8] A. Dobrovsky, U. M. Borghoff, and M. Hofmann (2016) An approach to interactive deep reinforcement learning for serious games. Cited by: Related Work.
  • [9] Eger,M., C. Martens, and M. Cordoba (2017) An intentional ai for hanabi. pp. 68–75. Cited by: Related Work.
  • [10] L. Gale, D. M. McCubbins, and M. Turner (2015) Against game theory. Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource, pp. 1–16. Cited by: Introduction.
  • [11] A. X. Gao and A. Pfeffer (2012) Learning game representations from data using rationality constraints. arXiv:1203.3480 [cs.GT]. Cited by: Introduction.
  • [12] R. Hromek and S. Roffey (2009) Promoting social and emotional learning with games: “it’s fun and we learn things”. Simulation and Gaming 40(5), pp. 626–644. Cited by: Introduction.
  • [13] G. Huang, H. Larochelle, and S. Lacoste-Julien (2019) Centroid networks for few-shot clustering and unsupervised few-shot classification. arXiv:1902.08605 [cs.LG]. Cited by: Related Work.
  • [14] Lambert,G., S. Vyawahare, and R. Austin (2014) Bacteria and game theory: the rise and fall of cooperation in spatially heterogeneous environments. Interface Focus 4(4). Cited by: Introduction.
  • [15] J. S. Louis and C. Miles (2005)

    Playing to learn: case-injected genetic algorithms for learning to play computer games


    IEEE Transactions on Evolutionary Computation

    9(6), pp. 669–681.
    Cited by: Related Work.
  • [16] Osborne,J. M. and A. Rubinstein (1990) Bargaining and markets. Academic Press. Cited by: Introduction.
  • [17] G. B. Ramani and R. S. Siegler (2008) Promoting broad and stable improvements in low‐income children’s numerical knowledge through playing number board games. Child development 79(2), pp. 375–394. Cited by: Introduction.
  • [18] M. Ren, E. Triantafillou, S. Ravi, J. Snell, K. Swersky, B. J. Tenenbaum, H. Larochelle, and R. Zemel (2018) Meta-learning for semi-supervised few-shot classification. arXiv:1803.00676 [cs.LG]. Cited by: Related Work.
  • [19] B. Settles (2009) Active learning literature survey. Cited by: Related Work.
  • [20] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. v. d. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, and I. Sutskever (2016)

    Mastering the game of go with deep neural networks and tree search

    Nature 529, pp. 484–489. Cited by: Related Work.
  • [21] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv Reprint arXiv:1712.01815. Cited by: Related Work.
  • [22] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362 (6419), pp. 1140–1144. Cited by: Related Work.
  • [23] S. Thrun (1994) Learning to play the game of chess. pp. 1069–1076. Cited by: Related Work.
  • [24] A. R. Wagner (2016) Using games to learn games: game-theory representations as a source for guided social learning. Cited by: Introduction.
  • [25] Whitehouse,D., I. Cowling, P., and J. E. Powley (2013) Integrating monte carlo tree search with knowledge-based methods to create engaging play in a commercial mobile game. Cited by: Related Work.
  • [26] M. Wortsman, K. Ehsani, M. Rastegari, A. Farhadi, and R. Mottaghi (2019-06) Learning to learn how to learn: self-adaptive visual navigation using meta-learning. In

    The IEEE Conference on Computer Vision and Pattern Recognition

    Cited by: Related Work.
  • [27] K. Xenou, G. Chalkiadakis, and S. Afantenos (2019) Deep reinforcement learning in strategic board game environments. Vol. 11450, pp. 233–248. Cited by: Related Work.
  • [28] T. Yu, P. Abbeel, S. Levine, and C. Finn (2018)

    One-shot hierarchical imitation learning of compound visuomotor tasks

    arXiv:1810.11043 [cs.LG]. Cited by: Related Work.
  • [29] T. Yu, C. Finn, A. Xie, S. Dasari, T. Zhang, P. Abbeel, and S. Levine (2018) One-shot imitation from observing humans via domain-adaptive meta-learning. Cited by: Related Work, Related Work.
  • [30] M. Zare, A. Ayub, A. R. Wagner, and R. J. Passonneau (2019) Show me how to win: a robot that uses dialog management to learn from demonstrations. In Proceedings of the 14th International Conference on the Foundations of Digital Games, New York, NY, USA, pp. 78:1–78:7. Cited by: Reasoning with the Game Tree to Ask the Right Questions.