Super Mario as a String: Platformer Level Generation Via LSTMs

03/02/2016 ∙ by Adam Summerville, et al. ∙ University of California Santa Cruz 0

The procedural generation of video game levels has existed for at least 30 years, but only recently have machine learning approaches been used to generate levels without specifying the rules for generation. A number of these have looked at platformer levels as a sequence of characters and performed generation using Markov chains. In this paper we examine the use of Long Short-Term Memory recurrent neural networks (LSTMs) for the purpose of generating levels trained from a corpus of Super Mario Brothers levels. We analyze a number of different data representations and how the generated levels fit into the space of human authored Super Mario Brothers levels.



There are no comments yet.


page 2

page 6

page 7

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Procedural Content Generation (PCG) of video game levels has existed for many decades, with the earliest known usage coming from Beneath Apple Manor in 1978. As an area of academic study, there have been numerous approaches taken ranging from human-authored rules of tile placement ([Shaker et al.2011]), to constraint solving and planning ([Smith, Whitehead, and Mateas2010]). These approaches require the designer of the algorithm to specify either the rules of the generation or a method for evaluating the generated levels. More recently, machine learning approaches have been used so that a designer can train a generator from examples. These have ranged from matrix factorization ([Shaker and Abou-Zleikha2014]) ([Summerville and Mateas2015]), to a number of different Markov chain approaches ([Dahlskog, Togelius, and Nelson2014])([Snodgrass and Ontañón2013])([Summerville, Philip, and Mateas2015]), to different graph learning approaches ([Guzdial and Riedl2015])([no and Missura2015]), to Bayesian approaches ([Summerville et al.2015]

). A number of these approaches treat the level as a sequence of characters, and use Markov chains to learn character-to-characters transition probabilities. These approaches can have good local coherence but struggle with global coherence. This can be remedied by considering longer and longer histories, but this leads to a rapidly exploding combinatoric space leading to a very sparse set of learned transitions.

Long Short-Term Memory recurrent neural networks (LSTMs) represent the state of the art in sequence learning. They have been used for translation tasks ([Sutskever, Vinyals, and Le2014]), speech understanding ([Graves, rahman Mohamed, and Hinton2013]), video captioning ([Venugopalan et al.2014]), and more. In this paper we present a method for utilizing LSTMs for the purpose of learning and generating levels. We examine eight different data representations for the levels and compare the generated levels using a number of different evaluation statistics.

Related Work

Machine learned platformer level generation has mostly been performed via learning Markov chain transition probabilities. The transitions in the Markov chains has seen two major approaches, either tile-to-tile transitions or vertical slices of the level. The vertical slice approach was first used by Dahlskog et al. ([Dahlskog, Togelius, and Nelson2014]) and was later used by Summerville et al. ([Summerville, Philip, and Mateas2015]). In this approach the states of the Markov chain are taken as a vertical slice of tiles and the transitions are learned from slice-to-slice. This has the benefit of directly encoding vertical structure which is important due to the fact that certain tiles are much more likely to be at the top (like empty tiles) and others are much more likely to be at the bottom (such as the ground). However, this comes with the severe drawback of removing the ability to generate novel vertical slices. It also has the problem that the state space is much larger than just the number of tiles, making the learned transition space much sparser than just learning tile-to-tile transitions.

Snodgrass and Ontañón ([Snodgrass and Ontañón2013, Snodgrass and Ontañón2014]) have used tile-to-tile transitions for the placement of tiles. Because of this, they do not have the problems of the vertical slice approach, but this brings a host of other issues. Namely, it can be far too likely for the system to generate sequences of tiles that violate implicit semantic constraints, such as generating ground tiles at the top of the level. Multiple attempts were made to reduce such problems via learning different chains for varying heights, or learning a hierarchical Markov chain ([Snodgrass and nón2015]). We build on the tile-to-tile approach for the work we report here as it is better able to generalize over a larger collection of levels. With the column based approach it is very easy to imagine that a new level might have a column that has never been seen before and so the system will be unable to generalize, but it is much harder to imagine a single-tile-to-tile transition that has never been seen (unless it is one that should never be seen like an enemy embedded inside a pipe).

The above approaches both have the drawback of no guarantees about playability. In Snodgrass’s work, over half of the levels produced were unable to be completed by a simulated agent ([Snodgrass and Ontañón2013]). Summerville, et al. ([Summerville, Philip, and Mateas2015]) used Monte Carlo Tree Search as a way of guiding Markov chain construction of levels to help guarantee a playable level. This does allow for some authorial input on the part of a designer by letting them change the reward function to create levels with different features (e.g. more/less gaps), at the cost of biasing the generation away from the learned transition probabilities in ways that might not be easily understood or resolved. Despite the different applications of Markov chains, they all suffer from the same main drawback, that there is no good way to have a long memory in the chain without needing massive amounts of data to fill the transition table. This leads to good local coherence but poor global coherence, in that they can reliably pick the next tile in a way that makes sense, but have difficulty handling larger structures or patterns.

The work of Guzdial and Riedl ([Guzdial and Riedl2015]) used clustering to find latent groupings and then learned relative placements of tile blocks in Mario levels. This has the benefit of learning structure over much longer ranges than is possible in the Markov chain examples. While the work of Dahlskog was able to use 3 tiles worth of history Guzdial’s work considered tile distances up to 18 tiles in distance. A similar graph approach was used by Londoño and Missura ([no and Missura2015]), but learned a graph grammar instead. Their work is unique in platformer level generation in that they used semantic connections such as tile X is reachable from tile Y instead of just physical connections.

The only other known Neural Network approach is that of Hoover et al. [Hoover, Togelius, and Yannakakis2015]. They used a single level at a time as training input and tried to predict the height of a specific block type given the height of that block type in the previous 2 time steps. Extensions tried to predict the height of a given block type given the other block types, the idea being that a user could draw in portions of the level and allow the system to fill in the rest.

Procedural generation via neural networks has most been focused on image and text generation. Both have typically been byproducts of trying to train a network for some sort of classification process. Google’s Inceptionism (

[Szegedy et al.2014]

) took the product of training an image classifier and manipulated images to better match the set classification class, such as dogs. Facebook’s Eyescream had image generation as the goal (

[Denton et al.2015]) and used adversarial networks, one network trying to generate images and the other trying to determine if what it was seeing was generated or real.

Text generation from neural networks has been done by numerous people for various goals such as translation ([Sutskever, Vinyals, and Le2014]), text classification ([Graves2013]), and even trying to determine the output of code simply by reading the source code ([Zaremba and Sutskever2014]) and then writing what it believes to be the output.

The work of Graves ([Graves2013]) went beyond text generation and used sequences of stylus locations and pressure to learn calligraphic character writing. Similarly, the DRAW system ([Gregor et al.2015]) used LSTMs to generate address signs learned from Google street view by using an attentional system to determine where to focus and then using that as the sequencing of where to place a virtual paint brush and what color to paint, e.g. it learns a sequence of .

Lstm Sequence Generation

In this section we will first cover how LSTM Recurrent Neural Networks operate. We will then cover the basics of our data specification as well as the variants we used.

Recurrent Neural Network Architecture

Figure 1: Graphical depiction of an LSTM block.

Recurrent Neural Networks (RNNs) have been used for a number of different sequence based tasks. They operate in a manner similar to standard neural networks, i.e. they are trained on data and errors are back-propagated to learn weight vectors. However, in an RNN the edge vectors are not just connected from input to hidden layers to output, they are also connected from a node to itself across time. This means that back-propagation occurs not just between different nodes, but also across time steps. However, a common problem with standard RNNs is known as “the vanishing gradient problem” (

[Hochreiter1991]), where the gradient of the errors very quickly dissipates across time. While different in why it occurs, the end result is similar to the problem found with Markov chains, i.e. good local coherence but poor global coherence.

LSTMs are a neural network topology first put forth by Hochreiter and Shmidhuber ([Hochreiter and Schmidhuber1997]

) for the purposes of eliminating the vanishing gradient problem. LSTMs work to solve that problem by introducing additional nodes that act as a memory mechanism, telling the network when to remember and when to forget. A standard LSTM architecture can be seen in figure 1. At the bottom are nodes that operate as in a standard neural network, i.e. inputs come in, are multiplied by weights, summed, and that is passed through some sort of non-linear function (most commonly a sigmoid function such as the hyperbolic tangent) as signified by the S-shaped function. The nodes with

simply sum their inputs with no non-linear activation, and the nodes with simply take the product of their inputs. With that in mind, the left-most node on the bottom can be thought of as the input to an LSTM block (although for all purposes it is interchangeable with the node second from the left). The node second from the left is typically thought of as the input gate, since it is multiplied with the input, allowing the input through when it is close to 1, and not allowing the input in when it is close to 0. Similarly, the right-most node on the bottom acts as a corresponding output gate, determining when the value of the LSTM block should be output to higher layers. The node with the acts as the memory, summing linearly and as such not decaying through time. It feeds back on itself by being multiplied with the second from the right node, which acts as the forget gate, telling the memory layer when it should drop whatever it was storing.

Figure 2: Graphical depiction of our chosen architecture. The green box at the bottom represents the current tile in the sequence, the tile to the left the preceding tile, and the tile to the right the next tile. The tile in the top orange box represents the maximum likelihood prediction of the system. Each of the three layers consists of 512 LSTM blocks. All layers are fully connected to the next layer.

These LSTM blocks can be composed in multiple layers with multiple LSTM blocks per layer, and for this work we used 3 internal layers, each consisting of 512 LSTM blocks which can be seen in figure 2. The input layer to the network consists of a One-Hot encoding where each tile has a unique binary flag which is set to 1 if the tile is selected and all others are 0. The final LSTM layer goes to a SoftMax layer, which acts as a Categorical probability distribution for the One-Hot encoding.

Our networks were trained using Torch7 ([Collobert, Kavukcuoglu, and Farabet]) based on code from Andrej Karpathy ([Karpathy2015]

). In addition to choosing size and depth of the network, RNNs (and subsequently LSTMs) come with the additional hyperparameter of how many time steps to consider during the back-propagation through time, which we set to 200 data points. A common problem in machine learning is that of memorization, where the algorithm exactly memorizes the input data at the expense of being able to generalize to unseen data. There exist numerous methods to avoid this overfitting, but for this work we used dropout (

[Srivastava et al.2014]). Dropout is a technique where during training a pre-specified percentage of LSTM blocks are dropped from the network (along with their corresponding connections) at random for each training instance.

This has the effect of effectively training an ensemble of networks during the training of one network and reduces the amount of co-adapting that occurs across nodes. Having decided on the network architecture, we will now get to the core of our data representation and what we mean by a data point.

Data Specification

LSTMs are most commonly used to predict the next item in a sequence. They do this by producing a probability distribution over possible next items and predicting the most likely item. For this to work, our data must be a sequence. If we were to consider a format such as the one used by Dahlskog et al. ([Dahlskog, Togelius, and Nelson2014]) we could simply consider a level as a sequence of slices progressing from left-to-right, but this comes with multiple drawbacks:

  • It can only reproduce vertical slices encountered in the training set - meaning that it could be utterly unable to handle a previously unseen slice

  • This drastically increases the size of the input space as we will see in a second

Instead we treat each individual tile of a level from Super Mario Brothers as a point in our sequence, that is, like a character in a string. The tile types we consider are:

  • Solid - Any tile that is solid and has no other interactions, most commonly ground, giant mushroom, or inert block tiles of which no distinction is made

  • Enemy - Any enemy, again no distinctions are made between enemies

  • Destructible Block - A block that can be destroyed by Super Mario

  • Question Mark Block With Coin - A ?-Block that only contains a coin

  • Question Mark Block With Power-up - A ?-Block that contains a powerup, either a mushroom/flower, a star, or a 1-up

  • Coin - A coin

  • Bullet Bill Shooter Top - The top of a Bullet Bill shooting cannon

  • Bullet Bill Shooter Column - The column that supports the top of the Bullet Bill shooter

  • Left Pipe - The left side of a pipe

  • Right Pipe - The right side of a pipe

  • Top Left Pipe - The top left side of a pipe

  • Top Right Pipe - The top right side of a pipe

  • Empty - The lack of all other tile types

Distinctions are made between the variants of ?-Blocks due to the extreme difference in effect that they make in the game. The vast majority of ?-Blocks contain only coins and if we were to ignore the difference between ?-Blocks we would be losing crucial information about where the most important power-up bearing blocks are located. Pipes are one of the most important structures in the game. Since previous systems have shown difficulty accurately producing non-gibberish pipes ([Snodgrass and Ontañón2013]), we want to consider the different parts of the pipe separately. It is important to know that the left half of a pipe is different from the right half in the sequence, as the latter can never be encountered before the former.

Each of these tile types can be considered as a character in a string, but a platformer level has 2 dimensions, not just 1, meaning that we have to induce an ordering across the 2 dimensions. The naïve ordering would be left-to-right and most likely bottom-to-top, as the levels progress from left-to-right and there is a higher density of important tiles at the bottom of a level due to the nature of simulated gravity. However, this sequencing would be highly problematic for most platformers. Levels in Super Mario Brothers are between 200 and 500 tiles in width, which would mean that the LSTM would need to remember for at least that distance what tile it had placed in the previous row. While LSTMs are good at remembering over longer distances, this would strain the algorithm for no particularly important reason.

Instead, we consider eight different induced tile orderings determined by answers to the following three questions:

  • Bottom-To-Top or Snaking?

  • Includes Path Information?

  • Includes Column Depth Information?

We now discuss these possible orderings in depth.

Bottom-To-Top vs. Snaking

Figure 3: Illustration of the difference between going from bottom-to-top (LEFT) and snaking (RIGHT) for sequence creation

As mentioned, we are considering levels vertically and then horizontally. This comes first with a decision of whether to go bottom-to-top or top-to-bottom, of which we chose to go from bottom-to-top due to the higher density of important, non-empty tiles at the bottom of a level. This can be seen in the left of figure 3. To do this we include a special character in the sequences to indicate when we have reached the top of the level and are moving to the next column.

However, there is another, less intuitive manner of progressing through the level: snaking. By snaking we mean alternating bottom-to-top and top-to-bottom orderings as seen in the right of figure 3. In this snaking ordering we are following ([Sutskever, Vinyals, and Le2014]) which demonstrated that by reversing input streams one can induce better locality in the sequence. To see this, consider the pipe in figure 3. In the bottom-to-top ordering there are 17 tiles between the left and right halves of the pipe, while there are only 7 in the snaking ordering. Snaking also comes with the hidden benefit of doubling the size of our dataset. When snaking we generate two sequences for a level, one that starts from bottom-to-top as well as one that starts from top-to-bottom.

Path Information

In what we consider to be a novel contribution to the field of machine learned procedural level generation, we also considered the path(s) that a player would/could take through the level. A common problem with machine learned level generation has been in producing levels that are playable. Snodgrass and Ontañón found that only roughly half of all levels produced were able to be completed by a simulated agent. However, if we consider a player’s path through the level as a crucial piece of information in the specification of the level and include it in the training information, we are much more likely to generate playable levels since we are not just generating the level geometry but also a player’s path through the level.

We ran a simulated agent through existing levels using a tile-level A agent. Any empty space that the agent passed through was replaced with a special character denoting the player’s path. We considered all paths that were within 10 moves of optimal for this, but this could either be loosened to simulate worse paths or tightened to only consider optimal paths.

Column Depth

We also considered how far into the level we are. Obviously, the LSTM has some memory as to where it has been and therefore where it exists in the level, but our history of 200 tiles means that in essence it only remembers approximately 12 columns in the past. This captures most patterns that might be encountered, but is unlikely to capture larger considerations such as a ramping in intensity as the player progresses further and further into the level. To try to get the LSTM to understand level progression we introduced a special character that is incremented every 5 columns. In other words columns 0-4 have no special character, columns 5-9 have 1 special character, columns 10-14 have 2, etc.


Figure 4: The below ground seed (LEFT) and the above ground seed (RIGHT).
Snaking? Paths? Line #? Negative Log-Likelihood
N N N 0.0920
Y N N 0.1540
N Y N 0.0694
Y Y N 0.0573
N N Y 0.1376
Y N Y 0.1143
N Y Y 0.0404
Y Y Y 0.0177

Table 1: The error of the best network for each data format. The Snaking-Path-Depth has the lowest error.

We trained eight networks, one for each of the induced sequences. We had a total of 15 levels from Super Mario Brothers and 24 levels from the Japanese Super Mario Brothers 2

for a total of 39 levels. We used a 70%-30% training-evaluation split in the course of the training. After every 200 tiles in the training sequence we performed an evaluation on a held out evaluation set, determining how well the LSTM was able to predict unseen sequences. We save this version of the network if it has the best score on the evaluation set, with the belief that this will be the network that has the best ability to generalize from the training set. When the training reached a plateau for more than 2 epochs (runs through the training set) we stopped training. The error criterion is the negative log-likelihood, which is 0 if the correct value is predicted with 100% certainty and increases as the tile is less likely to be predicted.

S? P? D?
Human Authored
N N N 0.81 0.03 47.68 0.15 20.66 10.65
Y N N 0.77 0.04 33.46 0.12 28.61 6.81
N Y N 0.8 0.72 0.04 59.56 0.15 24.17 12.17
Y Y N 0.67 48.06 0.19 19.2 9.34
N N Y 0.82 0.02 59.53 0.17 19.69 9.96
Y N Y 0.76 45.99 0.1 15.04 6.68
N Y Y 0.81 0.03 46.84 0.17 18.31 8.21
Y Y Y 0.81 0.68 0.03 42.14 0.19 19.02 8.04
Table 2:

The mean values for each of the considered metrics. Values in bold are within 1 standard deviation of the original, human-authored levels.

The results of LSTMs over the evaluation task can be seen in table 1. The change that had the biggest effect on performance was the inclusion of path information, with the networks with paths doing roughly twice as well as the networks without. Interestingly, the other changes considered without path information worsened network performance, but in tandem with path information improved overall performance. This demonstrates the difficulty of determining a good data specification; each non path change of the data specification away from the naïve base case decreased the performance of the LSTM, but when combined with the path information produced performance over twice as good as the next best network and better than the naïve, despite the fact that the most complex specification had a vocabulary that was 36% larger from the naïve specification.

However, our goal is not to analyze how well a sequence of tiles might have believably come from an existing Super Mario Brothers game, but rather to generate new levels with properties hopefully similar to existing Super Mario Brothers levels. To that end we used each of the final trained networks to generate 4000 levels. Each generator was given a seed sequence and then asked to generate until it reached an end-of-level terminal symbol 2000 times for each of two different seeds, an underground and an above ground seed as seen in figure 4. Each seed consists of three columns of tiles, as well as an initial special character denoting the start of the level. We then analyzed these levels for a number of different properties. Some, such as linearity and leniency, have been used to evaluate platformer level generation since their inception ([Smith and Whitehead2010]). Others are based off of more recent proposals from Cannosa and Smith ([Canossa and Smith2015]). In this evaluation we have also included the metrics for the levels from the dataset. It is hard to generate an intuition for whether a given metric is a good indicator that the generator is producing content that we should be happy with, but since our goal is to produce levels that have properties similar to existing levels we can compare how well the output space of our generator matches that of the existing levels. The variants of generators are:

  • S? - Y if the generator was trained on snaking data, N if bottom-to-top

  • P? - Y if the generator had path information, N if not

  • D? - Y if the generator had depth information, N if not

The metrics we considered were:

  • - The percentage of the levels that are completable by the simulated agent

  • - The percentage of the level taken up by empty space

  • - The negative space of the level, i.e. the percentage of empty space that is actually reachable by the player

  • - The percentage of the level taken up by “interesting” tiles, i.e. tiles that are not simply solid or empty

  • - The percentage of the level taken up by the optimal path through the level

  • - The leniency of the level, which is defined as the number of enemies plus the number of gaps minus the number of rewards

  • - The linearity of the level, i.e. how close the level can be fit to a line

  • - The number of jumps in the level, i.e. the number of times the optimal path jumped

  • - The number of meaningful jumps in the level. A meaningful jump is a jump that was induced either via the presence of an enemy or the presence of a gap.

Figure 5: A corner plot of the expressive range for the original levels. The histograms along the diagonal are for each of the metrics listed along the bottom. The others are 2D contour plots showing the density of levels with the bottom metric being the X axis and the left metric being the Y axis.
Figure 6: A corner plot of the expressive range for the levels generated by the Snaking-Path-Depth generator.
Figure 7: 5 levels from the Snaking-Path-Depth generator without path information displayed.
Figure 8: 5 levels from the Snaking-Path-Depth generator with generated path information displayed.

In table 2 we can see the mean values of the level metrics for the different generators. A metric value is bold if the value was within one standard deviation of the value of the original levels. All of the generators generally performed well at matching the existing levels, at least in these metrics. Certain metrics, like percentage of empty space, leniency, linearity, percentage of decoration, # of jumps, and # of induced jumps were fairly consistently good across all generators. Interestingly, the lengths of paths through the generated levels were all consistent with each other and all were consistently higher than in the existing levels.

Not surprisingly, the generators with path information did much better at producing completable levels. To our knowledge, the playability of our generated levels is the best of any generator employing machine learning, beating the best reported of ([Snodgrass and nón2015]). The beats even the best reported human-authored system from the 2011 Mario AI Competition by Mawhorter of as reported by Snodgrass and Ontañón ([Snodgrass and nón2015]).

Following are two corner plots which show the density of generated levels across a single dimension (the outer diagonal corresponds to a histogram for the metric along the bottom) or across two dimensions (the left and the bottom corresponding to x and y dimensions). In figure 4 we can see the original human-authored levels and in figure 5 we can see the results from the Snaking-Path-Depth generator. Generally, the generator does a very good job of matching the expressive range of the original levels..

However, until we are able to train a system to analyze a level for how good it is or how fun it is (which seems a lofty AI-complete goal), such metric-based evaluation must be supplemented with an informal analysis of example generated levels to verify that the metrics actually correspond to human perceptions of high-quality levels. Following are a selection of levels chosen at random from the Snaking-Path-Depth generator. The first 5 are levels as they would be shown, the latter 5 with the generator’s included path information. We believe that these random samples demonstrate the quality of the generated levels and showcase the breadth of patterns it has learned, such as a series of pipes of differing heights, pyramidal structures, and even sequences of unconnected platforms that are still playable. But it is just a random sampling and all of the generated levels can be found online at

Ignoring the fact that the LSTMs are able to generate playable levels, it is interesting that the LSTMs are even able to produce coherent column-to-column tile placements. Even the fact that the LSTMs never incorrectly produce a column delimiting character is testament to their sequence learning capabilities. Without fail the LSTMs only produce columns of 16 tiles, exactly the size of the input columns. This does not come from telling the generator to stop or reset after 16 tiles, but from learned sequences.

Conclusion and Future Work

In this paper we have demonstrated a novel use of LSTM Recurrent Neural Networks for the generation of platformer levels, specifically Super Mario Brothers levels. Towards this, we have created multiple novel ways of interpreting a Super Mario Brothers level as a linear sequence. Most importantly, we have demonstrated that the introduction of player path information can dramatically improve the quality of generated levels, most specifically in the playability of generated levels, but also in terms of exhibiting level metrics closer to the human-authored levels. We have also used a wider range of metrics for characterizing our levels than have previously been used, allowing us to perform a more nuanced comparison between human-authored and generated levels.

We believe that the introduction of path information to the generation process also opens new avenues for automated analysis of levels. When a designer first designs a level, they often have a hypothesis of what the player will do before any playtesting has occurred. This system works in a similar way by assigning likelihoods to where a player will go based on observation. We would like to extend this system so that it not only generates levels, but also analyzes presented levels. While it is possible to send simulated agents through the level to generate analysis, something we ourselves did for this work, future work may enable the system to make suggestions based solely on observations without explicitly modeling or simulating a player. We would like to utilize human playthroughs gathered from gameplay video, to be able to suggest paths that actual human players would take instead of just simple A agents.

Recent work in neural network learning have used attention based systems to learn sequences from data that is not typically thought of as a sequence, such as an image. While the sequencing in this work is able to produce good results, we feel it could be made more robust by incorporating the sequencing into the learning task. We also believe that this would enable the system to operate in regimes where a simple left-to-right progression is unsuited, such as games like The Legend of Zelda or Metroid . These games tend to have highly non-linear progressions with numerous backtrackings and dead-ends required during play. An attentional system could learn to navigate these non-linear maps.

A similar concern for future work is deconvolving time and space. In standard platformer levels, the player progresses from left-to-right, which has led to many generators, this one included, to operate as if time has a linear mapping to space, e.g. x-coordinate. If we were to treat the screen as a tensor with dimensions of width, height, and tile type, we could perform a Tensor auto-regression across time. For a game like

Super Mario Brothers, this would work simply as progressing from left-to-right, but for the above mentioned non-linear games, this would work to eliminate the spurious correlation of time and x-coordinate.


  • [Canossa and Smith2015] Canossa, A., and Smith, G. 2015. Towards a procedural evaluation technique: Metrics for level design. In Proceedings of the FDG workshop on Procedural Content Generation in Games.
  • [Collobert, Kavukcuoglu, and Farabet] Collobert, R.; Kavukcuoglu, K.; and Farabet, C. Torch7: A matlab-like environment for machine learning.
  • [Dahlskog, Togelius, and Nelson2014] Dahlskog, S.; Togelius, J.; and Nelson, M. J. 2014.

    Linear levels through n-grams.

    In Proceedings of the 18th International Academic MindTrek Conference.
  • [Denton et al.2015] Denton, E. L.; Chintala, S.; Szlam, A.; and Fergus, R. 2015. Deep generative image models using a laplacian pyramid of adversarial networks. In CoRR.
  • [Graves, rahman Mohamed, and Hinton2013] Graves, A.; rahman Mohamed, A.; and Hinton, G. E. 2013. Speech recognition with deep recurrent neural networks. CoRR abs/1303.5778.
  • [Graves2013] Graves, A. 2013. Generating sequences with recurrent neural networks. In CoRR.
  • [Gregor et al.2015] Gregor, K.; Danihelka, I.; Graves, A.; and Wierstra, D. 2015. DRAW: A recurrent neural network for image generation. In CoRR.
  • [Guzdial and Riedl2015] Guzdial, M., and Riedl, M. O. 2015. Toward game level generation from gameplay videos. In Proceedings of the FDG workshop on Procedural Content Generation in Games.
  • [Hochreiter and Schmidhuber1997] Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural Computing.
  • [Hochreiter1991] Hochreiter, S. 1991. Learning causal models of relational domains. In Master’s thesis, Institut fur Informatik, Technische Universitat, Munchen.
  • [Hoover, Togelius, and Yannakakis2015] Hoover, A. K.; Togelius, J.; and Yannakakis, G. N. 2015. Composing video game levels with music metaphors through functional scaffolding. In Proceedings of the ICCC Workshop on Computational Creativity and Games.
  • [Karpathy2015] Karpathy, A. 2015. The unreasonable effectivenss of recurrent neural networks.
  • [Nintendo RD11986] Nintendo RD1. 1986. Metroid.
  • [Nintendo RD4 1986] Nintendo RD4 . 1986. Super Mario Brothers 2.
  • [Nintendo RD41983] Nintendo RD4. 1983. Super Mario Brothers.
  • [Nintendo RD41986] Nintendo RD4. 1986. The Legend of Zelda.
  • [no and Missura2015] no, S. L., and Missura, O. 2015. Graph grammars for super mario bros levels. In Proceedings of the FDG workshop on Procedural Content Generation in Games.
  • [Shaker and Abou-Zleikha2014] Shaker, N., and Abou-Zleikha, M. 2014. Alone we can do so little, together we can do so much: A combinatorial approach for generating game content. In

    AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

  • [Shaker et al.2011] Shaker, N.; Togelius, J.; Yannakakis, G. N.; Weber, B. G.; Shimizu, T.; Hashiyama, T.; Sorenson, N.; Pasquier, P.; Mawhorter, P. A.; Takahashi, G.; Smith, G.; and Baumgarten, R. 2011. The 2010 mario ai championship: Level generation track. IEEE Trans. Comput. Intellig. and AI in Games 3:332–347.
  • [Smith and Whitehead2010] Smith, G., and Whitehead, J. 2010. Analyzing the expressive range of a level generator. In Proceedings of the FDG workshop on Procedural Content Generation in Games.
  • [Smith, Whitehead, and Mateas2010] Smith, G.; Whitehead, J.; and Mateas, M. 2010. Tanagra: a mixed-initiative level design tool. In FDG, 209–216. ACM.
  • [Snodgrass and nón2015] Snodgrass, S., and nón, S. O. 2015. A hierarchical mdmc approach to 2d video game map generation. In Eleventh Artificial Intelligence and Interactive Digital Entertainment Conference.
  • [Snodgrass and Ontañón2013] Snodgrass, S., and Ontañón, S. 2013. Generating maps using markov chains. In AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.
  • [Snodgrass and Ontañón2014] Snodgrass, S., and Ontañón, S. 2014. Experiments in map generation using markov chains. In Proceedings of the 2014 Conference on the Foundations of Digital Games (FDG 2014).
  • [Srivastava et al.2014] Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15:1929–1958.
  • [Summerville and Mateas2015] Summerville, A., and Mateas, M. 2015. Sampling hyrule: Multi-technique probabilistic level generation for action role playing games. In AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.
  • [Summerville et al.2015] Summerville, A.; Behrooz, M.; Mateas, M.; and Jhala, A. 2015. The learning of zelda: Data-driven learning of level topology. In Proceedings of the FDG workshop on Procedural Content Generation in Games.
  • [Summerville, Philip, and Mateas2015] Summerville, A.; Philip, S.; and Mateas, M. 2015. Mcmcts pcg 4 smb: Monte carlo tree search to guide platformer level generation. In AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.
  • [Sutskever, Vinyals, and Le2014] Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequence to sequence learning with neural networks. In CoRR.
  • [Szegedy et al.2014] Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A. 2014. Going deeper with convolutions. In CoRR.
  • [Venugopalan et al.2014] Venugopalan, S.; Xu, H.; Donahue, J.; Rohrbach, M.; Mooney, R. J.; and Saenko, K. 2014. Translating videos to natural language using deep recurrent neural networks. CoRR abs/1412.4729.
  • [Zaremba and Sutskever2014] Zaremba, W., and Sutskever, I. 2014. Learning to execute. In CoRR.