Automatic Critical Mechanic Discovery in Video Games

09/06/2019 ∙ by Michael Cerny Green, et al. ∙ NYU college 11

We present a system that automatically discovers critical mechanics in a variety of video games within the General Video Game Artificial Intelligence (GVG-AI) framework using a combination of game description parsing and playtrace information. Critical mechanics are defined as the mechanics most necessary to trigger in order to perform well in the game. In a user study, human-identified mechanics are compared against system-identified mechanics to verify alignment between humans and the system. The results of the study demonstrate that our method is able to match humans with high consistency. Our system is further validated by comparing MCTS agents augmented with critical mechanic information against baseline MCTS agents on 4 games in GVG-AI. The augmented agents show a significant performance improvement over their baseline counterparts for all 4 tested games, demonstrating that knowledge of system-identified mechanics are responsible for improved performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Tutorials are designed to help a player learn how to play a game. They come in several different forms, such as text instructions (e.g. “press A to jump”), examples where an agent demonstrates what to do (e.g. watching an AI jump), and instructional content, like levels, that gradually introduce core mechanics as you play them. Games tend to utilize a combination of these tutorial styles to teach important features in the game. Previous research in automatic tutorial generation has introduced possible tutorial types [10], and introduced methods for tutorial text and demonstration generation [8], and tutorial level generation [9, 12]. But there has not been a method for identifying the most important, or “critical” mechanics to teach. We define a “critical mechanic” as one that is crucial to winning, i.e. the mechanics that, when triggered, have a significant positive effect on player performance.

A complicated task, such as playing a video game, can usually be divided into a number of subtasks, each with their own subgoals. For example, leaving a room might involve finding a key, removing any obstacles on the way to the door, getting to the door and opening it. The idea of subdividing a larger task into smaller constituent tasks in order to make it easier to solve is common within both the planning and reinforcement learning literature 

[19, 1]. One can find similar ideas in this work, where subgoals are restricted to the triggering of specific game mechanics, rather than finding individual game states. This idea mirrors research in game state compression by Cook and Raad [5], where reversible game actions are ignored in favor of unique “irreversible actions,” (i.e. actions which the player cannot undo).

Human beings enjoy video games for, among other things, the obvious challenges they provide. Figuring out the game mechanics, what affordances exist, and how to win are all goals players have when playing. Video games can be created with this as a goal: the fun comes from discovery and progress. To win, the player must figure out how to play. Therefore, it is safe to assume that if a human is able to beat a game, they know what mechanics are important to play it. A system that claims to discover critical mechanics in gameplay must therefore identify the same critical mechanics that a human does.

In this paper, we demonstrate the automatic discovery of “critical game mechanics,” using playtraces from humans and/or artificial agents playing these games, and recommend this as a potential module within a tutorial generator. Furthermore, we validate this approach through a user study which shows that humans believe the discovered critical path mechanics are important to achieve a strong performance. We also present a new way of incorporating mechanic information into stochastic forward planning algorithms, such as Monte Carlo Tree Search [4], which we use to further validate how well this method discovers subgoals. We propose that any critical mechanic discovery system could be validated using combined tests.

Background

The following section discusses previous research in the areas of Monte Carlo Tree search, subgoal discovery in reinforcement learning and hierarchical planning, and the General Video Game AI framework.

Monte Carlo Tree Search (MCTS)

MCTS [4, coulom2007efficient, 16] is a stochastic tree-search based algorithm that creates asymmetric trees by expanding more promising nodes more often. It consists of four phases: selection, expansion, simulation, and backpropagation. In the selection phase, the algorithm decides which node it should select to be expanded next using a selection policy, one of the more widely used ones being UCT [15]. This policy defines how the algorithm will select between exploring unseen tree nodes or exploiting nodes. During the expansion phase, a new node is added to the tree as a child of the selected node. During the simulation phase, the newly created child node is forward-simulated until it reaches either some terminal state (a win or a loss) or some pre-defined threshold (i.e 500 moves into the future). Finally, in the backpropagation phase, the reward value is calculated for the simulation phase’s final state and is used to update the values of the visited nodes, from the newly created node to the tree root. The algorithm runs in an iterative fashion, and the updated node values define how to guide the search in the next iteration.

MCTS is commonly augmented for improved performance depending on the environment. Macro actions are one such improvement, where sequences of actions are constructed and performed instead of single ones to cut down the search space [20]. Another MCTS augmentation is a UCT modification called “mixmax”. In this method, the UCT selection equation is changed to factor in the average child value and the maximum child value, and a weighted average of these is used to calculate the final value of a node. This emboldens the agent to take riskier actions that have a higher potential for bad outcomes as long as they have potential for good ones [11].

Subgoals in Reinforcement Learning and Planning

One of the earliest mentions of subgoals in reinforcement learning can be found in the work of Singh ( singh1992transfer), who proposed the existence of elemental tasks i.e. behaviors an agent can achieve that accomplish some conditional goal. By sequentially lining up these elemental tasks, an agent could improve training and generalization.

Subgoal discovery was built on top of this idea of a state or behavior marking progress along the path of solving a problem. It is defined as a problem domain in which the goal is to automatically derive intermediate reward states to improve performance. Maron’s Diverse Density Algorithm [18] was first used for automated subgoal discovery by McGovern and Barto [19]. Asadi and Huber used Monte Carlo sampling in reinforcement agents to discover subgoals for faster training [1].

Hierarchical MCTS algorithms [24] typically take advantage of information gathering to automatically find target states to assist in the building of the search trees of agents, such as UCT and POMDP agents. This approach works particularly well when a Markovian decision process is abstracted into a partially observable one, as this can significantly reduce the state branching factor [2]

. IGRES is an example of a randomized partially observable Markov decision process (POMDP) solver which uses subgoal discovery to leverage information about state space 

[17]. IGRES is able to cut down on potential solution space, thus decreasing the amount of computation time while maintaining good performance.

It is important to note that the method proposed in this paper is not intended as a contribution to hierarchical planning; rather the hierarchical MCTS experiment is carried out as a secondary way of validating the correctness of the identified critical mechanics.

General Video Game Artificial Intelligence Framework (GVG-AI)

GVG-AI is a framework for general video game playing [perez20152014, 22, 21], aimed at exploring the problem of creating artificial players that are able to play a variety of games well. It has an annual competition where AI agents compete and are judged on their performance in games unseen by them beforehand. In the competition, each agent has to decide the next taken action in 40 milliseconds provided with a forward model for the current game. The framework’s environment is constantly evolving [21] and adding more tracks to the competition such as level generation track [14], rule generation track [13], learning agents track [23], and two-player agents track [7].

The GVG-AI framework uses the Video Game Description Language (VGDL) to describe the games it runs [6]. The language is human-readable, simple and compact, but expressive enough to allow for the creation of a wide variety of simple 2D games. Some of them are adaptations of classical games, such as Pacman (Namco 1980) and Sokoban (Imabayashi 1981), while others are brand new games, such as Wait For Breakfast. To write a game in VGDL, one only needs to describe the behaviour of game elements, what happens when they collide, and how to win or lose the game. A VGDL game consists of a game description file and one or more level description files. The game description file contains a Sprite Set, or game objects that can be instantiated, including the sprite’s behavior, images used, etc; an Interaction Set, or a list of how sprites interact; a Termination Set, or what conditions trigger an end to the game; and a mapping between game sprites and the symbols representing them in the level files.

System Overview

Our system receives two inputs: a description of the game rules and a series of playtraces of the game. From these two things, it builds a “mechanic graph”, which contains the system’s understanding of the game-play affordances. It inserts playtrace data into this graph, then searches it to generate a “critical path.” We define a critical path as a series of mechanics that need to be triggered in order to reach a winning terminal state [8]. We then augment MCTS agents with these mechanics by modifying their state evaluation function to take into account the occurrence of these mechanics during play. The following subsections further describe the mechanic graph creation, the playtrace informed graph search, and the modifying of an MCTS agent with mechanic information.

Mechanic Graph Generation

The first step of critical path construction involves the mechanics of the game in question. Our system contains the same parser as the one in the AtDelfi [8] system111https://github.com/mcgreentn/GVGAI, which is able to transform VGDL code into a graph of game entities (e.g. sprites and other objects), conditions (e.g. collisions, termination, etc), and events that occur if these conditions are met (e.g. destroying a sprite, gaining points, etc). Figure 1 displays a mechanic of key collecting as seen by the system after parsing VGDL for building a mechanic graph.

Figure 1: A pickup-key mechanic in a generated mechanic graph. A player colliding with a key results in the player picking up the key.

Playtrace-informed Graph Search

After a mechanic graph is created and all possible game mechanics are represented, the system informs the graph with playtrace information, which can be collected from human players or automated agents. Given a collection of playtraces for a single game level, the system looks for the playtrace that (1) contains the lowest amount of unique mechanics represented on the graph, and (2) in which the player won the level. In doing so, it infers that the playtrace must contain knowledge of which mechanics must be triggered in order to beat the level. By singling out the playtrace with the lowest amount of unique mechanics, it can minimize gameplay “noise”, such as accidentally walking into walls (which triggers an interaction with the wall), or completing other actions that have nothing to do with winning the game.

Each mechanic in the playthrough is linked to the particular game-frame in which it occurred. For each unique mechanic triggered during that playtrace, the system looks for the earliest frame during gameplay when that mechanic occurred and enters it into the corresponding node in the graph. Once this has been done for all mechanics, the system performs a best first search algorithm over the graph, starting from player-centric mechanics (i.e. those that the player either initiates or is otherwise involved in, like colliding with coins or swinging a sword) and ending with a positive terminating one (i.e. winning the game). The cost of a node is that node’s frame value. Thus, the search creates a path of the earliest occurring mechanics. The pseudocode for this process can be found in Algorithm 1.

1:function findCritPath
2:     
3:     
4:     
5:     while   do
6:         
7:         for  in  do
8:              if  < then
9:                                 
10:              if  is  then
11:                  
12:                  
13:                                          
14:         
15:         
16:               return
Algorithm 1 Finding the Critical Path of a Game

The path through the graph that the system creates to get from an initial player action to a terminating mechanic becomes a list of critical game mechanics. Any “sibling-mechanics,” or mechanics that are nearly identical in nature to ones in the critical path, are also added to the critical mechanic list manually. Sibling mechanics are defined as mechanics that contain identical condition-action pairs, and sprites that are classified in VGDL within the same family. 

222https://github.com/GAIGResearch/GVGAI/wiki/Sprites For example, in GVG-AI’s Zelda, hitting either a bat or a spider with the sword results in that entity’s destruction. If either the bat-sword mechanic or the spider-sword mechanic is contained within the critical mechanic list, the other one will also be included.

Mechanic-Augmented MCTS

After the critical mechanics for a particular game have been found, we can augment an MCTS agent with this mechanic information. Traditionally, the evaluation function of an MCTS agent takes into account the game state at the end of the simulation phase of the algorithm. This reward is then backpropagated up the tree. However, we can modify this evaluation function to take into account all simulated event data as well, adding additional rewards for any simulated events that match conditions of critical path mechanics. This is a similar approach to the use of subgoals in hierarchical planning mentioned previously in the background section, one difference being that the agent is a simple MCTS agent with no other improvements rather than a hierarchical MCTS or reinforcement learning agent. Another notable difference is that the subgoals defined here are represented as game mechanics, rather than game states. This partial state abstraction affords a greater degree of generality across domains.

The value of these additional mechanic rewards decreases with frequency using a discount factor, similar to the discount rate used in reinforcement learning. Each time the agent triggers a specific mechanic in its past during play, the subsequent reward decreases by . This reward is also decreased the further out in planning the agent finds the mechanic. Therefore, mechanics that get triggered earlier on in planning backpropagate greater rewards than those that happened later. This creates an urgency in a mechanic-augmented agent, focusing the agent’s search to areas where mechanics trigger early and frequently. The reward equation for a single instance of a critical mechanic during planning is given in Equation 1, where B is a constant, D is the decay rate, C is the number of occurrences this mechanic has been triggered until now, F is the game tick of the current node in the planning tree, and N is the game tick of the root node of the tree.

(1)

Experiments

This system is designed so that either human or agent playtraces are acceptable, as long as the game can be beaten. Thus, to run experiments on the algorithm for creating critical paths of mechanics, we collected human playtraces. Participants played a minimum of different levels each for GVGAI games:

  • Solarfox: is a port of Solar Fox (Bally/Midway Mfg. Co 1981). The goal is to collect all the gems in the level, while dodging the flames being thrown by enemies. Each gem collected give the player a point. Several levels contain “powered gems,” which transform into normal gems if after certain time if not collected, otherwise they are worth no points.

  • Zelda: is inspired by The Legend of Zelda (Nintendo 1986). To win, the player must pick up the key and unlock the door. Monsters populate the level and can kill the player, causing them to lose. The player can swing a sword; if the sword hits a monster, the monster is destroyed, and the player gains a point.

  • Plants: is inspired by Plants vs. Zombies (PopCap Games 2009). If the player survives for game ticks, they win. Zombies spawn on the right side of the screen, and me towards the left. The player loses if a zombie reaches the left side. The player needs to grow plants on the left side of the screen. Plants automatically fire zombie-killing peas, and the player gains a point if a pea kills a zombie. Occasionally, zombies will throw axes to the left side, which destroy plants.

  • RealPortals: is inspired by Portal (Valve 2007). The player must reach the goal, which sometimes is behind a locked door that needs a key. Movement is restricted by water, which kills the player if they touch it. To succeed, players need to pick up wands, which allow them to toggle between the ability to create portal entrances and portal exits through which they can travel across the map. There are also potions on some levels, which the player can push into the water to transform the water into solid ground.

These games were selected based on previous work [3], which categorized these games as ones that MCTS algorithms perform particularly poorly on. They also contain a diverse array of mechanics and terminal conditions: time-based (Plants), lock-and-key (Zelda and RealPortals), and collection (SolarFox).

Game Participants System Discovered Mechanics
Solarfox 23 collide with a gem to collect it collect all gems to win
Zelda 26 pick up the key unlock the door with the key to win
Plants 18
press space to use the shovel use the shovel on dirt to make a plant
if axe and a plant collide, both are destroyed survive for 1000 ticks to win
Realportals 26
press space to shoot a missile if the missile collides with a wall, it turns into a portal
pick up different wands to toggle between portal types
teleport from the portal entrance to the portal exit You can’t go through the portal exit
collide with the key to pick it up Unlock the door with the key
collide with the goal to capture it and win
Realportals
2 & 3
26
[above] if a potion collides with the portal entrance, it is teleported to the portal exit
collide with a potion to push it if a potion collides with water, the water is turned into ground
Table 1: The number of human participants for each game, and the mechanics discovered by the system. RealPortals is split up into level 1 and levels 2 & 3 because of the addition of several key mechanics in the later levels

We ran our system on these four games with an average of 23 human playtraces for each game. Table 1 shows the identified critical path for each of these games. This table was made using raw mechanic information output by our system, translated by humans into a more understandable form. For example, the original game rule “MultiSpriteCounter stype1=blib stype2=powerblib limit=0 win=True” essentially means “collect all gems to win”. The generated critical path only contains the minimum mechanics that are required to win the level, as signified by our system. For example: In Zelda, the critical mechanics do not include mechanics related to destroying enemies because the player does not need to destroy enemies to win (unless they are blocking their way). Similarly, the critical mechanics for Plants do not contain any mechanics about destroying zombies, since this is inferred when player creates plants. Lastly, in Realportals, the game introduces the presence of potions in levels 2 & 3 (they do not exist in level 1) which caused the system to generate a more complex critical path for these levels.

Evaluation

Before a critical path of mechanics could be used by the system (such as in the creation of tutorials as discussed before), it is necessary to verify if the subgoals/mechanics in the path are actually “critical,” i.e. are important in order to achieve a good performance in the game. In our experiments, we validate the system-identified critical paths in two ways: a user study comparing human-identified critical mechanics against the system-identified ones, and baseline-vs-augmented agent comparison. The user study experiment verifies that humans (as a baseline) also identify the same critical mechanics as important for good performance. The agent-comparison experiment verifies that (at least from the perspective of a game-playing artificial agent) triggering critical mechanics during gameplay results in better performance. The following subsections explain the human-identified mechanic comparison study and present the results of MCTS agent comparison study.

Human-identified Mechanic Comparison Study

In the user study, we compare the critical mechanics discovered by our system to those that humans believe are important for good performance. After playing the levels of each game during the playtrace collection, participants would be given the following prompt: “In short sentences, describe what the player needs to do in order to perform well in the game.” This particular phrase was selected in order to get a full understanding of which mechanics players prioritize in the particular game. The participants responded using a free-text answer space, chosen to avoid biasing answers if we had used predefined responses. After this, we formatted these responses into categories for easier analysis. For every critical path subgoal the system identified in every game, we record the percentage of users who believed the subgoal is important. We also include all other mechanics that participants thought were important but the system did not.

Game Mechanic Percentage is Critical
Solarfox Avoid Flames 68%
Collide with gems to pick them up 64% X
Avoid Walls 18%
Zelda Collide with the key to pick it up 80% X
Unlock the door with the key 80% X
Kill Enemies with Sword 76%
Avoid dying by colliding with Enemies 60%
Navigate the level walls using arrow keys 20%
Move quickly 12%
Plants Press Space to use the shovel 100% X
Use the shovel on grass to plant plants 100% X
Plants kill zombies by shooting pellets 76%
When plants get hit with axes, both are destroyed 53% X
Protect the villagers form zombies for some time 35%
Add plants to different areas to get good coverage 29%
Axes don’t affect player 6%
RealPortals Press space to shoot a missile 72% X
If the missile collides with a wall, it turns into a portal 72% X
If a potion collides with water, the water is turned into ground 72% X
Unlock the door with the key 68% X
Collide with the goal to capture it 52% X
Collide with the key to pick it up 48% X
Pick up different wands to toggle between portal types 44% X
Teleport from the portal entrance to the portal exit 44% X
Collide with a potion to push it 40% X
Avoid dying by colliding with water or portal entrance with no exit 32%
If a potion collides with the portal entrance, it is teleported to the portal exit 16% X
You can’t go through the portal exit 0% X
Table 2: The percentage of each mechanic being mentioned in the user study

Table 2 shows the results of the user study. Mechanics identified by the algorithm’s critical discovery method have the highest percentages of being mentioned by participants in all games except Solarfox. In Solarfox, a slightly higher number of people think that avoiding flames is more important than collecting the gems. We postulate that the constant movement of the player (the player can only change directions, not speed) and the large collision areas of the flames caused some users to focus more on flame avoidance than collecting gems. Humans not only identify important mechanics for winning but also ones to avoid losing. For example, in Zelda, “Avoid dying by colliding with enemies” is identified by 60% of participants. Other participants note subgoals that usually reflect a better playing strategy, such as “Add plants to different areas to get good coverage.” The last mechanic type identified by participants pertains to scoring higher. In Zelda, the “kill enemies with sword” mechanic appears 76% of time, and in Plants, the “Plants kill zombies by shooting pellets” mechanic also appears 76% of time.

One system-identified mechanic in Portals, “You can’t go through the portal exit,” was never mentioned by any of the participants. We hypothesize there may be several reasons for this, one being that the mechanic seems very trivial to humans. It occurs in the playtraces because of the way the game is implemented in VGDL: after teleporting from entrance to exit, the game forces the player to step away from the exit. Participants who beat the game may not have thought it important enough to mention, and players who were unable to beat the game might have never realized that the portals were different types and colors.

Baseline-vs-Augmented Agent Study

In this evaluation, we compare the performance of a baseline MCTS agent against an MCTS agent augmented with the critical mechanics discovered for Solarfox, Zelda, Plants, and RealPortals. The baseline MCTS agent is the one that comes with the GVG-AI framework and used for benchmarking in other GVG-AI papers. Each agent is given unique levels to play for each game, of these levels being identical to the user study levels and being unique to this evaluation. Each level is played times, for a total of playthroughs per game. An agent is permitted 500 milliseconds to decide its next action every turn. All experiments took place on Intel Xeon E5-2690v4 2.6GHz CPU processor within a Java Virtual Machine limited to 8GB of memory. An experiment was allowed to be a maximum of 48 hours long; however, none reached this limit.

Figure 1(a) displays a comparison of win rates between the agents, and figure 1(b) displays average normalized scores with a confidence interval. Scores are normalized by level using the maximum and minimum obtainable scores for that level and then averaged together. Zelda and Solarfox both have fixed maximum and minimum scores for all levels. Because the maximum score value in Plants is based on randomness, we instead score agent performance by their survival time. RealPortals does not have an upper bound on score due to the nature of its game mechanics, so we clamp scoring to the minimum optimal score needed to solve each level.

(a) The win rates of both agents on all four games.
(b) The mean normalized scores of both agents on all four games.
Figure 2: Comparing the performance between the vanilla MCTS vs the augmented agent.

From Figure 2, we can see that the augmented agent was able to perform better than the vanilla MCTS on all games. The low win rates in RealPortals may be a response to the complexity of the game. Subgoal events are sparse in Realportals and do not appear for vast amounts of game time. Additionally, it is possible for the agent to trap itself into an unwinnable state, which may also be a factor.

Discussion

Based on the results from the user study and the mechanic-augmented agent experiments, we conclude that our system is successful at correctly identifying most critical mechanics for a variety of games. This suggests that the method could be a crucial component in a tutorial generation system.

For Zelda, Solarfox, and Plants, the augmented agent results demonstrate significant performance improvement over baseline when the critical path is incorporated into the search algorithm. However, humans identify important mechanics not present in the system-generated critical path, like the fact that the player can kill enemies in Zelda, that one should avoid flames in Solarfox, or that peas kill zombies in Plants. We can attribute this to the way our system searches for the critical path. The goal of the system is to find a least cost path using mechanics that result in a winning state, thus it does not search for a result that avoids a losing state. As a result, it will not actively include mechanics that may be important to players in order to avoid dying or losing the game (like protecting villagers in Plants or avoiding flames in Solarfox), nor will our system actively search for mechanics that might already be implied by other mechanics. This reflects the mechanic-augmented agent’s higher score results, since, in all four games, higher scores typically resulted in moving toward a winning state. Despite a lack of win rate improvement in agent performance for Realportals, the augmented agent did score higher on average than baseline. Furthermore, users concurred with the system-identified mechanics, suggesting that for this game (and perhaps others similar to it) the agent will have to be more intelligently augmented with mechanics.

We speculate that some of the success of the augmented agent can be explained by the previously discussed “urgency” created from the way critical paths inform the agent’s tree search algorithm. No longer indecisive, the agent now is driven towards a singular goal of, for example, grabbing the key and unlocking the door in Zelda, destroying all monsters that prevent it from doing so (and therefore getting more points). In contrast, the baseline agent will appear indecisive in its movements, unable to make a sequence of moves that act toward a specific goal since multiple paths seem equivalently rewarding.

There is an interesting discussion point to be had in regards to a game like Realportals. Even though agent performance is higher on a scoring basis, the augmented agent still could not reliably win levels, and the way that it is gaining points for (going back and forth between portals) can hardly be considered a successful strategy for a human being. In cases like this, we believe a system like ours could be the foundation of an intelligent debugging process for game developers. If developers do not observe the desired behavior in agents who are informed of the mechanics of the game, they could then modify the game’s rules or levels to better account for it.

This work can be used to further research in mechanic discovery and mechanic usage in games and game applications. By using past playtraces and game time as units of measure, our system is able to identify mechanics and augment MCTS agents with them, improving agent performance. These mechanics might be able to augment agents in other ways too, like using them as intermediate rewards during training to help reinforcement learning agent generalize better.

This critical mechanic discovery method is primarily meant to be used within tutorial generation, such as the AtDelfi system [8], to automatically construct tutorials that teach humans how to play games. Prior research [9] demonstrates that mechanics can be used to generate levels. In addition to the arcade games shown here, our system could be extended in future work to incorporate more complex games. Our system can capture macro actions in these larger goal-oriented games (and in ones where players define their own goals), which can then be used to extract the critical mechanics as demonstrated in this paper. This work is compatible with the idea of game state compression [5], in the sense that mechanics which could be defined as “irreversible actions” might give insight into which should be defined as “critical” or vice versa

Conclusion

In this work, we present a system that extracts relevant mechanics from playtraces. Due to the nature of video games, we used human intuition as our baseline for critical mechanic discovery. Our system was able to identify most of same the mechanics that humans believed to be important for good performance. We also use these mechanics to augment an MCTS algorithm to improve game-play performance. In all four tested games (Zelda, Solarfox, Plants, and Realportals), agents showed improvement in win rates and/or score averages, demonstrating that our method is adaptive to a variety of arcade style games in the GVGAI framework. Because MCTS is widely applicable across domains, we believe that this method will be similarly widely applicable.

As a next step, we intend on further exploring uses for this method of mechanic discovery in games, such as using them as intermediate rewards in reinforcement learning agents. We believe this research could be a foundation for an intelligent debugging process for game developers, allowing them to adjust a game’s rules/levels in response to the playtrace of an agent augmented with the mechanics of the game. Furthermore, we plan on improving existing tutorial generation systems by automatically generating instructions and levels that teach game mechanics using this approach.

Acknowledgements

Michael Cerny Green acknowledges the financial support of the GAANN program. Ahmed Khalifa acknowledges the financial support from NSF grant (Award number 1717324 - “RI: Small: General Intelligence through Algorithm Invention and Selection.”). Tiago Machado acknowledges the finacial support from CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico under the Science without Borders scholarship 202859/2015-0.

References

  • [1] M. Asadi and M. Huber (2005) Autonomous subgoal discovery and hierarchical abstraction for reinforcement learning using monte carlo method.. In AAAI, pp. 1588–1589. Cited by: Introduction, Subgoals in Reinforcement Learning and Planning.
  • [2] A. Bai, S. Srivastava, and S. J. Russell (2016) Markovian state and action abstractions for mdps via hierarchical mcts.. In IJCAI, pp. 3029–3039. Cited by: Subgoals in Reinforcement Learning and Planning.
  • [3] P. Bontrager, A. Khalifa, A. Mendes, and J. Togelius (2016) Matching games and algorithms for general video game playing. In Twelfth Artificial Intelligence and Interactive Digital Entertainment Conference, Cited by: Experiments.
  • [4] C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton (2012) A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games 4 (1), pp. 1–43. Cited by: Introduction, Monte Carlo Tree Search (MCTS).
  • [5] M. Cook and A. Raad (2019) Hyperstate space graphs for automated game analysis. In COG, Cited by: Introduction, Discussion.
  • [6] M. Ebner, J. Levine, S. M. Lucas, T. Schaul, T. Thompson, and J. Togelius (2013) Towards a video game description language. Dagstuhl Reports. Cited by: General Video Game Artificial Intelligence Framework (GVG-AI).
  • [7] R. D. Gaina, D. Pérez-Liébana, and S. M. Lucas (2016) General video game for 2 players: framework and competition. In 2016 8th Computer Science and Electronic Engineering (CEEC), pp. 186–191. Cited by: General Video Game Artificial Intelligence Framework (GVG-AI).
  • [8] M. C. Green, A. Khalifa, G. A. Barros, T. Machado, A. Nealen, and J. Togelius (2018) AtDELFI: automatically designing legible, full instructions for games. In Proceedings of the 13th International Conference on the Foundations of Digital Games, pp. 17. Cited by: Introduction, Mechanic Graph Generation, System Overview, Discussion.
  • [9] M. C. Green, A. Khalifa, G. A. Barros, A. Nealen, and J. Togelius (2018) Generating levels that teach mechanics. In Proceedings of the 13th International Conference on the Foundations of Digital Games, pp. 55. Cited by: Introduction, Discussion.
  • [10] M. C. Green, A. Khalifa, G. A. Barros, and J. Togellius (2017) ”Press space to fire”: automatic video game tutorial generation. In Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference, Cited by: Introduction.
  • [11] E. J. Jacobsen, R. Greve, and J. Togelius (2014) Monte mario: platforming with mcts. In

    Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation

    ,
    pp. 293–300. Cited by: Monte Carlo Tree Search (MCTS).
  • [12] A. Khalifa, M. C. Green, G. Barros, and J. Togelius (2019) Intentional computational level design. In GECCO, Cited by: Introduction.
  • [13] A. Khalifa, M. C. Green, D. Perez-Liebana, and J. Togelius (2017) General video game rule generation. In 2017 IEEE Conference on Computational Intelligence and Games (CIG), pp. 170–177. Cited by: General Video Game Artificial Intelligence Framework (GVG-AI).
  • [14] A. Khalifa, D. Perez-Liebana, S. M. Lucas, and J. Togelius (2016) General video game level generation. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 253–259. Cited by: General Video Game Artificial Intelligence Framework (GVG-AI).
  • [15] L. Kocsis, C. Szepesvári, and J. Willemson (2006) Improved monte-carlo search. Univ. Tartu, Estonia, Tech. Rep 1. Cited by: Monte Carlo Tree Search (MCTS).
  • [16] L. Kocsis and C. Szepesvári (2006) Bandit based monte-carlo planning. In

    European conference on machine learning

    ,
    pp. 282–293. Cited by: Monte Carlo Tree Search (MCTS).
  • [17] H. Ma and J. Pineau (2015) Information gathering and reward exploitation of subgoals for pomdps. In Twenty-Ninth AAAI Conference on Artificial Intelligence, Cited by: Subgoals in Reinforcement Learning and Planning.
  • [18] O. Maron and T. Lozano-Pérez (1998) A framework for multiple-instance learning. In Advances in neural information processing systems, pp. 570–576. Cited by: Subgoals in Reinforcement Learning and Planning.
  • [19] A. McGovern and A. G. Barto (2001) Automatic discovery of subgoals in reinforcement learning using diverse density. In International Conference on Machine Learning, Cited by: Introduction, Subgoals in Reinforcement Learning and Planning.
  • [20] D. Perez, E. J. Powley, D. Whitehouse, P. Rohlfshagen, S. Samothrakis, P. I. Cowling, and S. M. Lucas (2014) Solving the physical traveling salesman problem: tree search and macro actions. IEEE Transactions on Computational Intelligence and AI in Games 6 (1), pp. 31–45. Cited by: Monte Carlo Tree Search (MCTS).
  • [21] D. Perez-Liebana, J. Liu, A. Khalifa, R. D. Gaina, J. Togelius, and S. M. Lucas (2019) General video game ai: a multi-track framework for evaluating agents, games and content generation algorithms. Transactions on Games. Cited by: General Video Game Artificial Intelligence Framework (GVG-AI).
  • [22] D. Perez-Liebana, S. Samothrakis, J. Togelius, T. Schaul, and S. M. Lucas (2016) General video game ai: competition, challenges and opportunities. In Thirtieth AAAI Conference on Artificial Intelligence, Cited by: General Video Game Artificial Intelligence Framework (GVG-AI).
  • [23] R. R. Torrado, P. Bontrager, J. Togelius, J. Liu, and D. Perez-Liebana (2018) Deep reinforcement learning for general video game ai. In 2018 IEEE Conference on Computational Intelligence and Games (CIG), pp. 1–8. Cited by: General Video Game Artificial Intelligence Framework (GVG-AI).
  • [24] N. A. Vien and M. Toussaint (2015) Hierarchical monte-carlo planning. In Twenty-Ninth AAAI Conference on Artificial Intelligence, Cited by: Subgoals in Reinforcement Learning and Planning.