Multi-Domain Level Generation and Blending with Sketches via Example-Driven BSP and Variational Autoencoders

06/17/2020 ∙ by Sam Snodgrass, et al. ∙ Northeastern University 0

Procedural content generation via machine learning (PCGML) has demonstrated its usefulness as a content and game creation approach, and has been shown to be able to support human creativity. An important facet of creativity is combinational creativity or the recombination, adaptation, and reuse of ideas and concepts between and across domains. In this paper, we present a PCGML approach for level generation that is able to recombine, adapt, and reuse structural patterns from several domains to approximate unseen domains. We extend prior work involving example-driven Binary Space Partitioning for recombining and reusing patterns in multiple domains, and incorporate Variational Autoencoders (VAEs) for generating unseen structures. We evaluate our approach by blending across 7 domains and subsets of those domains. We show that our approach is able to blend domains together while retaining structural components. Additionally, by using different groups of training domains our approach is able to generate both 1) levels that reproduce and capture features of a target domain, and 2) levels that have vastly different properties from the input domain.



There are no comments yet.


page 3

page 9

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Procedural content generation via machine learning (PCGML) [31] denotes a subgroup of PCG techniques that learn models of the type of content to be generated and then sample from those models to create new instances of the content (e.g. learn from a set of example game levels and then generate new levels having characteristics and properties of the example levels). Common challenges of PCGML approaches are the generalizability of trained models across domains and finding or creating the training data needed for a given domain. As such, most PCGML level generation approaches have only explored a handful of level domains (predominantly, Super Mario Bros. [30, 6, 25], Kid Icarus [25, 24, 20, 21], and The Legend of Zelda [29]).

Recent work has begun exploring ways of addressing the above challenges. Some have explored methods for leveraging existing training data to build models that generalize across several domains. These methods either try to supplement a new domain’s training data with examples from other domains [24], build multiple models and blend them together [8, 20], or directly build a model trained on multiple domains [21]. Such approaches are pushing the field towards more generally applicable PCGML techniques, and open the door for more creative PCGML [5]. We propose an approach to level blending that falls in the latter category. Our approach blends levels from different domains together by finding and leveraging structural similarities between domains.

We build on existing PCGML research by combining two methods for generating levels, variational autoencoders (VAEs) and example–driven binary space partitioning (EDBSP). We leverage these approaches to model and generate levels at two levels of abstraction: one abstraction layer captures the structural information of the levels, and the other captures the finer domain–specific details such as object, enemy, and item placements. We test and evaluate our proposed approach across platforming games, of which have not been used as training or test domains in prior PCGML research, to the best of our knowledge.

The main contributions of this paper are:

  1. A new PCGML approach for domain blending that combines two previous techniques, VAEs for modeling and generating structural level layouts and EDBSP for filling in those generated layouts by blending details from various domains.

  2. A multi-domain evaluation of the proposed approach exploring a broader range of domains than previous work.

2 Related Work

Procedural content generation via machine learning (PCGML) [31] describes a family of approaches for PCG that first learn a model of a domain from a set of training examples and then use that learned model to generate new content. Much PCGML research has focused on building models of individual domains in order to create new content within the chosen domain. A variety of approaches have been explored in pursuit of this goal (e.g., LSTMs [30], DBNs [6]

, Markov Models 

[25, 4], GANs [36], VAEs [35]), and each has shown its ability to generate levels within a chosen domain. However, these techniques are only applicable in the domains in which they are trained, and rely on the existence of training data from the target domain. For this work, among the above approaches, we chose to use VAEs. Prior work has demonstrated their potential for blending domains [21] by learning continuous, latent models of input domains. Additionally, unlike GANs, VAEs also learn the mapping from the input domain to the latent domain which may make it more suitable in a co-creative design context. This is particularly useful since we hope to develop our approach into a mixed-initiative tool in the future. Moreover, VAEs also offer potential for controllability in the form of conditional VAEs.

Recently there has been work exploring PCGML approaches for blending domains and domain transfer. Guzdial and Riedl [7] proposed a level blending model that blended different level styles within a single domain. Our work differs from theirs in that ours aims to blend between multiple domains. Guzdial and Riedl [8] have also proposed a method for blending and combining complete games via conceptual expansion on learned graph representations of games. Our work instead focuses on blending levels by finding structural similarities between training domains and an input level sketch. Snodgrass and Ontanón [24] presented a domain transfer approach for supplementing one domain with translated levels from another domain by finding mappings between the representations. In our work, we instead define a uniform abstract representation across domains which we use for finding structural similarities. Sarkar and Cooper [20] trained separate LSTMs on multiple domains, and created blended levels by switching between the trained models. While the abstract level generation stage of our approach is trained separately on different domains, our full resolution level generation stage which performs the blending need not be retrained.

In blending and generating levels by combining together parts of different domains, our work, like past work referred above [6, 7, 8, 5, 20, 21], also falls under combinational creativity [2], the branch of creativity where new ideas and concepts are generated by combining existing ones in novel ways. Such methods can help in producing and exploring new design domains via blending and combination, as we attempt to do in this work by blending existing platformer domains to create new ones.

The approaches that are most relevant to our proposed work are Sarkar et al.’s [21] use of VAEs for level generation and blending and Snodgrass’ [26] example-driven BSP approach for generating levels from an input sketch. We present a hybrid model that combines these methods into a single pipeline allowing for the creation of new sketches by sampling from VAEs to create structural level sketches, and generating fully realized blended levels by using EDBSP with access to multiple domains to fill in the details of the sketches. This work extends previous EDBSP work by using multiple domains, allowing for domain blending; and by using sketches generated by VAEs, thus highlighting the versatility of the EDBSP approach.

3 Methods

At a high level, our proposed approach is composed of two stages. First, we use a variational autoencoder (VAE) to model and generate the abstracted structural patterns from a set of training levels in a given domain. Next, we pass a generated structural level sketch to an example-driven extension to the binary space partitioning algorithm. This algorithm generates a fully realized level by finding matching structural patterns in a set of training levels across multiple domains, and using those level sections to fill in the details resulting in a blended level. Below we describe how we represent our levels, and each stage of our approach in more detail.

Figure 1: This figure shows a Lode Runner level (a), that same level represented with the full resolution representation (b), and that level represented with the sketch resolution representation (c).

3.1 Level Representations

We demonstrate our approach using a set of NES platforming games (described in Section 4.1). We represent game levels with a tile grid where a cell can take a value from a set of tile types corresponding to elements of the domain. Figure 1 (a-b) shows an example of such a representation. This style of representation is commonly used in PCGML approaches [31] and is also used by the Video Game Level Corpus (VGLC) [28]. Using this tile-based representation, we represent levels at two layers of abstraction, a Full Resolution layer and a Sketch Resolution layer. The tile types composing the full resolution layer differ between domains and correspond to specific structural components, interactive elements, enemies, and items in that domain. The sketch resolution layer, however, consists of the same three tile types across all domains:

  1. #, representing a solid/impassable element;

  2. -, representing empty space or otherwise passable elements;

  3. ?, representing a wildcard that can be interpreted as either solid or empty.

The wildcard tile extends the previous sketch resolution representation [26], and was included in this work to more easily capture structures that are not clearly represented by the empty or solid types (e.g., ladders). Figure 1 shows a Lode Runner level represented in these two abstractions.

3.2 Generating Sketch Resolution Levels

Variational autoencoders (VAEs) [11] are generative models that learn continuous, latent representations of training data which can then be sampled to produce novel outputs. Such models consist of an encoder which maps the input data to a latent space and a decoder which maps from points in this latent space to outputs. While vanilla autoencoders [9]

learn lower-dimensional latent representations of training data by only minimizing reconstruction error, VAEs additionally enforce the learned latent representation to model a continuous, probability distribution by minimizing the KL divergence between the latent distribution and a known prior (usually a Gaussian). Thus, similar to GANs, VAEs can generate novel variations of the training data in addition to being able to perform reconstruction. In this work, we used VAEs to generate levels at the sketch resolution layer, training a separate generative model for each domain.

3.3 Generating Full Resolution Levels

Figure 2: This figure shows the basic pipeline of the EDBSP algorithm. First, an input sketch is provided (a). This sketch can be chosen from the training data (as in Section 5.2) or generated with a VAE (as in Sections 5.1 and 5.3). Next, BSP is used to split the sketch into regions (b). Finally, structural matches for those sketch regions are found in the training data (c), and are used to create a full-resolution level (d).

Binary Space Partition (BSP) [22] is a partitioning algorithm classically used in PCG for dungeon generation. The standard BSP algorithm recursively splits regions of a map into two smaller regions using a random orientation (vertical or horizontal) and positioning within the region until some end condition is met (e.g., a specified number of regions are created). Another process then takes those regions and converts them into a level (e.g., connects regions with doors, places enemies and keys, etc.). We use an extension of BSP called Example-driven Binary Space Partition (EDBSP) [26] which uses training data to fill in the details of the produced regions. Specifically, EDBSP is given an input sketch for a level (Figure 2.a), and a set of training levels represented in both sketch and full resolution. BSP is then used to split the input sketch into regions (Figure 2.b). For each region in the sketch, all the matching sketch resolution regions in the training levels are found, and one is chosen randomly from the set for that region (Figure 2.c). The corresponding full resolution regions from the training set are then stitched together to produce the full resolution generated level (Figure 2.d).

4 Experiments

4.1 Domains

We test our level blending approach across seven domains chosen from NES platforming games: Castlevania (CV) [12], Kid Icarus (KI) [17], Lode Runner (LR) [23], Mega Man (MM) [3], Metroid (MT) [18], Ninja Gaiden (NG) [34], and Super Mario Bros. (SM) [16]. Each of these domains differs from the others in the number of levels available and the size and shape of those levels (e.g., LR has levels in the VGLC and KI has ). This results in imbalanced data sets, which could lead to one domain being over represented in the generated levels simply by having more examples to draw from. To better investigate the relationships between the domains and the capabilities of our approach, we standardize the amount of training data from each domain. Specifically, we use a subset of levels from each domain such that training data from each domain is composed of approximately tiles111The set of training levels used in each domain can be found here: Note, this value was chosen as it is the smallest number of tiles in our domains when using all data (i.e., the sum of tiles in all the CV levels is ).

We divide our domains according to the presence of wildcards:

  • WildCards (WC): This set contains domains with wildcard tiles in their sketch representations. This set includes CV ( levels), LR ( levels), MM ( levels), and NG ( levels).

  • No WildCards (WC): This set contains the domains that do not have wildcard tiles in their sketch representations. This set includes KI ( levels), MT ( section of the map split according to locked doors), and SM ( levels).

  • All Domains (ALL): This set is the union of the above sets.

4.2 Experimental Setup

We test our proposed approach on its ability to generate sketches and full resolution levels. We evaluate each of the stages of our approach individually, and then the full pipeline.

4.2.1 Sketch Generation

To test the sketch generation stage of our approach on its own, we trained a separate VAE on each of the domains, using the same overall architecture for each domain except for the dimensions of the input and output segments which we varied to suit each individual domain. For each VAE, the encoder consisted of 2 strided convolutional layers with batch normalization and leaky ReLU activation while the decoder consisted of 3 convolutional layers which were strided or non-strided as required by the dimensions of the specific domain. The decoder also used batch normalization but with ReLU activation. All models used a 32-dimensional latent space and were trained for 5000 epochs using the Adam optimizer and a learning rate of 0.001. For generation, we selected the model from the epoch which best minimized reconstruction error. All models were implemented using PyTorch

[19]. Note that we use fixed-size windows instead of full levels for training and generation. This is to account for the variation in level sizes both across and within domains and for the fact that convolutional generative models work with fixed-size inputs and outputs. Thus, like prior work using such models for level generation [36, 21], we generated our training data by sliding a fixed-size window across the levels in each domain and trained our models using those segments obtained after filtering out ones that contained any empty space. We used the following dimensions for each domain:

  • CV: 11x16

  • KI: 16x16

  • MM: 15x16

  • SM: 14x14

  • LR: 11x16

  • MT: 15x16

  • NG: 11x16

Note, we use different dimensions for the domains based on the height and width of the training levels.

For each domain, we then generated sketch resolution sections of the fixed-size for that domain. For evaluating these sections, we computed the following metrics for each segment:

  • Density: the proportion of solid tiles in a region.

  • Non–Linearity

    : how well a segment’s topology fits to a line. It is the mean squared error of running linear regression on the highest point of each of the columns in a segment. A zero value indicates perfectly linear topology.

  • Plagiarism: a pairwise metric which counts the number of rows and columns a segment shares with another segment.

  • E–Distance: a measure of the distance between two distributions introduced by [33] and suggested as a suitable metric for evaluating generative models by [32] due to certain desirable properties. The lower the E-distance, the more similar are the distributions being compared. For our evaluations, we computed E-distance using the Density and Non-Linearity of each of the 100 generated segments and that of a random sampling of 100 training segments, per domain.

Notice that we also computed these metrics for the training levels in order to compare against the generated set. The density, non–linearity, and E–distance metrics measure how well the VAE can capture and replicate the structural patterns from the training levels. The plagiarism metric measures how much the VAE copies from the training domain, and gives insight into whether the model is able to generate new sections or just replicate existing ones. Additionally, we computed self-plagiarism i.e. how much pairs of training segments plagiarize from each other, as a means of understanding how well or poorly the plagiarism detected in the generative model compares with that which already exists in the training data. Due to the large number of training segments compared to the 100 generated segments per domain, for our evaluations, we computed plagiarism and self-plagiarism values using a random sampling of 100 training segments. Additionally, statistical comparisons between generated and training segments were also performed using this sampling.

4.2.2 Conditional Sketch Generation

In addition to training a standard VAE on each sketch domain, we also trained a conditional VAE (CVAE) [27, 37] on sketches from all domains taken together, with each sketch labeled with its corresponding domain. Conditional generative models [15]

, as the name suggests, enable generation of outputs conditioned on some given input. Such models are trained simply by concatenating training data instances with the data to be used for conditioning such as a class label, for example. Thus a CVAE trained as described above could enable generating sketches of a desired domain allowing for greater control in the generation process. For our CVAE, we used a different architecture than the regular VAEs described above, with the encoder and decoder both consisting of 2 linear layers, though the latent space was still 32-dimensional. The conditioning input was a one-hot encoded vector indicating the domain of the corresponding input sketch. For training, we used segments of dimension 11x16 for all domains as this was the largest window size that could accommodate all domains. The 11x16 segments were flattened to a single-dimensional input vector for the linear layers. Unfortunately, we did not obtain strong results using this approach and did not use CVAE-generated sketches as inputs to EDBSP for full level generation. However, conditioning the generation process still resulted in interesting outputs and opens up directions to consider for future work.

4.2.3 Full Resolution Generation

To test the full resolution generation stage of our approach on its own, we used each of the domains separately as input sketches to the EDBSP algorithm paired with different subsets of domains as the levels used for filling in the details and blending. For this, we chose a domain, then generated a total of full resolution levels for that domain divided evenly amongst the sketches (e.g., LR has sketches, and therefore EDBSP generates four full resolution levels for each sketch; SM has sketches, and EDBSP generates full resolution levels for each sketch). We perform this process for each domain, using each defined subset of domains (i.e., WC, WC, ALL) as the example full resolution levels to EDBSP. While using a given domain for its sketches, we removed it from its respective training data subset. This resulted in generated levels per domain, for each subset of domains.

To test our full pipeline for level blending and generation (i.e., the full resolution generation stage combined with the sketch generation stage), we follow a similar procedure as above. We use the sketch sections generated for each domain using the VAE described in Section 4.2.1 as input to the EDBSP algorithm. For each domain we generate full resolution sections from each of the generated sketches. We perform this process with each defined subset of domains (WC, WC, ALL) as example full resolution levels for EDBSP, while removing the current sketch domain from the subsets. This results in full resolution sections for each domain, for each defined subset of domains.

We evaluated the generator and generated levels by computing:

  • Domain Proportion: the proportion of the generated level that was generated using a given domain. This is computed as .

  • Element Distribution Similarity: the distribution of common level elements in the generated level (i.e., empty space, solid objects, enemies, items, hazardous objects, and climbable objects). We compute the KL divergence [13] between this distribution in the generated levels and the training levels.

The domain proportion measure gives insight into the biases of our generator and representation. It can also help us understand which domains are structurally similar to one another and which contain more diverse structures. The element distribution similarity measures if the generator is able to approximate a domain using examples from other domains. KL divergence has been used by others to guide level generators [14, 36] and we use it here to measure relatedness between generated levels and the target domain.

Density Non–Linearity Plagiarism E–Distance
Domain Training Generated Training Generated Training Generated
Table 1: Computed metrics for VAE generated level sections. on the generated values indicates statistically significant differences between the generated sections and the training levels in terms of the corresponding metric (using Wilcoxon test with ). Metric values for generated sections that are not significantly different from those for training levels are preferred since they indicate that the learned distribution is not significantly different than the distribution of the training domain. Similarly, the lower the E-distance, the closer the learned distribution is to the training distribution.

5 Results and Discussion

Domain CVAE vs VAE CVAE vs Train VAE vs Train
Table 2: E–distance between CVAE-generated sketches and VAE-generated sketches from the corresponding domain and between 100 random sketches from the corresponding training domain. E-distances between the respective VAEs and training domains are also given for comparison.

5.1 Sketch Generation using VAEs

Table 1 depicts the results of our evaluations of the sketch sections generated using the VAEs. The results suggest that the VAE performs the best in learning the distribution of NG as exhibited by it having the lowest E-distance, followed by CV, SM and LR with the models for MM and especially MT and KI performing worse with respect to these metrics. Generated sketch sections for NG were the only ones to not be significantly different from the training set in terms of both Density and Nonlinearity, with those for CV and LR being significantly different in terms of one of these while those for the more E-distant MM, MT and KI

being different in terms of both. The outlier here is

SM which has the third lowest E-distance but is significantly different in terms of both metrics. One possible explanation is that while the sections have similar mean values for the metrics, the individual values for the metrics on the generated and training sections may be very different from one another. Overall, the VAEs seem to do better in domains with less dense level structures such as SM, CV and NG as opposed to those with higher density like MM, MT and KI. This makes sense as it requires the model to learn less complex structural elements. Note that we used the same architecture for each domain so it is likely that the denser domains could have been better learned using more complex models. In a similar vein, domains with more uneven in-segment topology (i.e. having highly non-linear segments) are more difficult to learn than those with more linear segments. Since we trained our generators using fixed-size segments rather than whole levels, global level structure did not impact how well the generators were able to learn the input distribution. CV, MT, MM, NG progress both horizontally and vertically, SM and KI progress only horizontally and only vertically respectively, but differences in VAE performance were not detected along these lines. Rather, as our results show, it is the more local segment-based properties in the training sketches that influence the quality of generated sketches. To better depict the capabilities of the generators, as per the recommendations of [32], for each domain, we show pairs of training and generated segments that were nearest and furthest with respect to each metric in Figure 6 in the Appendix.

(a) CV (b) LR (c) MM (d) NG (e) KI (f) MT (g) SM
Figure 3: Sketch sections generated by the CVAE using the same input vector but different domain as the conditioning input

As stated previously, our conditional sketch generation efforts did not produce strong results with most metrics being very different from the input domains with statistical significance. Table 2 shows E-distances between 100 CVAE-generated sketches vs. 100 sketches generated using the VAE of that corresponding domain and vs. 100 sketches sampled randomly from the training set. All distances are higher than those between the VAE-generated sketches and the training domain, with the exception of KI which proved to have the lowest E-distance for the CVAE case while having the highest for the VAE. However, seeing how the CVAE does not do well in other domains, we attribute this to be circumstantial and leave further investigations into the CVAE for future work. As exemplars of what is possible using this approach, Figure 3 shows segments generated using the same random vector conditioned on different domains.

Table 3: Distribution of domain proportions in full resolution levels generated from existing sketches
Table 4: Distribution of domain proportions in full resolution levels generated from generated sketches

5.2 EDBSP with Training Sketches

Table 3 shows the results of the domain proportion for each domain, across sets of levels generated with existing sketches. What is immediately apparent is that Lode Runner (LR) dominates many of the generated levels when it is included in the example set, particularly in the WC set where there are fewer example domains. This is likely because LR levels have a large proportion of wildcard tiles in their sketches as compared to the other domains ( of tiles in LR while next highest is of tiles in Mega Man (MM

)). Due to how EDBSP performs pattern matching with the wildcard tiles, this causes many more viable matches for

LR than for other domains, and an inflation of the prominence of LR in the generated levels. An example of this is shown in a generated KI level in Figure 5 (right).

The only generated set which uses LR, but does not have it as the most common domain is Metroid (MT), where MM is the most common. This may be due to the similarity in the structural layouts of MM and MT levels (i.e., both domains’ levels consist of large sections of horizontal and vertical traversals with smaller obstacles mixed in). Additionally, as mentioned above, MM has the second highest proportion of wildcard tiles. However, this relationship is not reciprocal. When using ALL domains, MT is the least frequent domain in the MM levels. This shows that the wildcard tiles are important when finding matching examples in the training data, but when present in the input sketch they lead to more matches in all domains. When generating with the WC example set, we see that MT is typically the most prevalent (or near to the most prevalent) domain, displaying the structural diversity of the domain. Lastly, while the generated blended levels in Figures 4 and 5 may not be playable using rules from the input sketch domain, we are not expecting the final levels to replicate a single domain and so playability within that domain is not required. Further, recall that the end goal is a mixed-initiative tool where the user could be controlling for final quality of the blended levels.

Domain Uniform ALL WC WC
Table 5: KL divergence between the training levels and levels generated from existing sketches using the distribution of game elements in the levels.
Domain Uniform ALL WC WC
Table 6: KL divergence between the training levels and levels generated from VAE-generated sketch sections using the distribution of game elements in the levels.

5.3 Training Sketches vs Generated Sketches

Table 5 shows the KL divergence between the element distributions

of the training levels, generated levels, and a uniform distribution. Here we can see the impact the choice of training data has on generating full resolution levels. Specifically, the last three rows containing the

WC domains show that KL divergence is lowest when using the associated WC training set. Alternatively, in the first four rows, the WC domains tend to have the lowest KL divergence with the levels generated using all the training data. This result shows that generally, the WC domain sketches benefit from a variety of training data with different properties, while the WC domain sketches are best filled with details from more similar domains. The outlier in this table is again LR, which has a higher KL divergence across all generated sets than any other domain, and when using the WC training domains, has a higher KL divergence than when compared with the uniform distribution of elements. This is due to the high frequency of special structures in LR.

Table 4 shows the distribution of the domains in the full resolution sections generated using VAE-generated sketch sections. In this table we can see the same trends as when generating with existing sketches, but more exaggerated. Specifically, we see that LR dominates the generated sections to a higher degree. This likely results from the interaction between the size of the sections generated, and the way the partitioning algorithm divides the regions. EDBSP splits sections using the minimum dimension of the input as the maximum size of a region. In smaller areas, this can result in the large portions of the section being assigned one domain, which is likely to be assigned to LR given its large number of wildcards.

Table 6 shows the KL divergence between element distributions in the training levels, VAE-generated sketches, and a uniform distribution. This table reflects the disproportionate representation of LR in the generated sections. Notably, the KL divergence has increased by large proportion in the ALL and WC generation sets, with much less variation in WC generated sets. Additionally, the lowest KL divergences are different from those in Table 5 for CV and NG, the WC domains with lower wildcard proportions.

The results all point towards the importance of the choice of domains when blending. If approximating a specific domain or style of level is desired, then the domains with levels similar to the desired style should be chosen. For example, approximating KI using MT and SM leads to similar element distributions. On the other hand, if replicating a specific domain or style is not the goal, but instead exploration of new potential domains, then mixing a variety of different domains and examples can result in levels that have vastly different properties from the input domains. For example, blending MT, KI, and SM with a sketch from MM results in levels with element distributions very different from the sketch domain.

6 Conclusions

We presented a novel, hybrid PCGML approach that combines the use of Example-Driven Binary Space Partitioning and VAEs to generate and blend levels using multiple domains. Our results demonstrate that different level generation and blending style goals (integrity vs. novelty, for example) can be traded off using different choices of domains. We consider several avenues for future work.

The experiments revealed that the choice of training domain representation can have a large impact on the resulting generated levels when blending. One avenue we would like to explore is intelligent automatic grouping of training domains. For example, if we know a priori that a set of domains has similar structures and game element distributions vs a set of domains that has similar structures but very different element distributions, we can better leverage the training data to guide the generator towards the users’ goals (e.g., novelty vs replication).

Similarly, future work could also explore different choices of abstractions. In this work, the solid/empty sketch resolution abstraction allowed us to blend domains based on structural similarities but other abstractions could be defined based on other affordances such as those given in the Video Game Affordances Corpus [1]. Abstractions based on such affordances could potentially enable blending across different genres that do not share the same structural patterns and properties.

Our conditional sketch generation results were not optimal and conditioning a combined model failed to approximate the distributions of individual domains. It is likely the architecture was not well suited to the problem but even so, the results depicted in Figure 3 suggest that this may be a promising direction to pursue. Successfully training such models would eliminate reliance on separate models for each domain for sketch generation. We would also like to explore other established blending and style transfer approaches. For example, how would CycleGAN [38]

or pix2pix 

[10] perform on tile-resolution data instead of pixel resolution?

Lastly, we are interested in developing this approach into a mixed-initiative tool for level design and blending by allowing users to select their input domains, and create sketches for the EDBSP algorithm to fill in. By leveraging VAEs to generate new sketches, we have shown that the EDBSP approach is able to handle unseen sketches well, and therefore user generated sketches should be usable by the algorithm. Furthermore, the inner workings of the EDBSP algorithm are straightforward and explainable; and we would like to perform a user study to determine if that explainability increases usability in a mixed-initiative setting.


Figure 4: The generated CV level with the lowest KL divergence () in the ALL generated set (above); and the generated CV level with the highest KL–divergence () in the WC generated set (below). Both are cropped for space.
Figure 5: The generated KI level with the lowest KL divergence () in the WC generated set (left); and the generated KI level with the highest KL–divergence () in the WC generated set (right). Both are cropped for space.
(a) Comparison of generated CV sketch sections and training sketch sections
(b) Comparison of generated LR sketch sections and training sketch sections
(c) Comparison of generated MM sketch sections and training sketch sections
(d) Comparison of generated NG sketch sections and training sketch sections
(e) Comparison of generated KI sketch sections and training sketch sections
(f) Comparison of generated MT sketch sections and training sketch sections
(g) Comparison of generated SM sketch sections and training sketch sections
Figure 6:

This figure shows VAE-generated sketch sections for each domain compared with the nearest and furthest counterparts in the training levels, based on the evaluation metrics.


  • [1] G. R. Bentley and J. C. Osborn (2019) The videogame affordances corpus. In 2019 Experimental AI in Games Workshop, Cited by: §6.
  • [2] M. A. Boden (2004) The creative mind: myths and mechanisms. Cited by: §2.
  • [3] Capcom (1987) Mega Man. Capcom. Note: Game [NES] Cited by: §4.1.
  • [4] S. Dahlskog, J. Togelius, and M. J. Nelson (2014)

    Linear levels through n-grams

    Proceedings of the 18th International Academic MindTrek. Cited by: §2.
  • [5] M. J. Guzdial and M. O. Riedl (2018) Combinatorial creativity for procedural content generation via machine learning. In

    Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence

    Cited by: §1, §2.
  • [6] M. Guzdial and M. Riedl (2016) Game level generation from gameplay videos. In Twelfth Artificial Intelligence and Interactive Digital Entertainment Conference, Cited by: §1, §2, §2.
  • [7] M. Guzdial and M. Riedl (2016) Learning to blend computer game levels. arXiv preprint arXiv:1603.02738. Cited by: §2, §2.
  • [8] M. Guzdial and M. Riedl (2018) Automated game design via conceptual expansion. In Fourteenth Artificial Intelligence and Interactive Digital Entertainment Conference, Cited by: §1, §2, §2.
  • [9] G. Hinton and R. Salakhutdinov (2006)

    Reducing the dimensionality of data with neural networks

    Science 313 (5786), pp. 504–507. Cited by: §3.2.
  • [10] P. Isola, Jun-Yan. Zhu, T. Zhou, and A. Efros (2017)

    Image-to-Image Translation with Conditional Adversarial Networks


    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    Cited by: §6.
  • [11] D.P. Kingma and M. Welling (2013) Auto-encoding variational Bayes. In The 2nd International Conference on Learning Representations (ICLR), Cited by: §3.2.
  • [12] Konami (1986) Castlevania. Konami. Note: Game [NES] Cited by: §4.1.
  • [13] S. Kullback and R. A. Leibler (1951) On information and sufficiency. The annals of mathematical statistics 22 (1), pp. 79–86. Cited by: 2nd item.
  • [14] S. M. Lucas and V. Volz (2019) Tile pattern kl-divergence for analysing and evolving game levels. In

    Proceedings of the Genetic and Evolutionary Computation Conference

    pp. 170–178. Cited by: §4.2.3.
  • [15] Mirza,Mehdi and Osindero,Simon (2014) Conditional generative adversarial networks. arXiv preprint arXiv:1411.1784. Cited by: §4.2.2.
  • [16] Nintendo (1985) Super Mario Bros.. Nintendo, Kyoto, Japan. Note: Game [NES]Nintendo, Kyoto, Japan. Cited by: §4.1.
  • [17] Nintendo (1986) Kid Icarus. Nintendo. Note: Game [NES] Cited by: §4.1.
  • [18] Nintendo (1986) Metroid. Nintendo. Note: Game [NES] Cited by: §4.1.
  • [19] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in PyTorch. In NIPS Autodiff Workshop, Cited by: §4.2.1.
  • [20] A. Sarkar and S. Cooper (2018) Blending levels from different games using lstms.. In 2018 Experimental AI in Games Workshop, Cited by: §1, §1, §2, §2.
  • [21] A. Sarkar, Z. Yang, and S. Cooper (2019) Controllable level blending between games using variational autoencoders. In 2019 Experimental AI in Games Workshop, Cited by: §1, §1, §2, §2, §2, §4.2.1.
  • [22] N. Shaker, A. Liapis, J. Togelius, R. Lopes, and R. Bidarra (2016) Constructive generation methods for dungeons and levels. In Procedural Content Generation in Games: A Textbook and an Overview of Current Research, N. Shaker, J. Togelius, and M. J. Nelson (Eds.), pp. 31–55. Cited by: §3.3.
  • [23] D. Smith and H. Soft (1983) Lode Runner. Broderbund. Note: Game [NES] Cited by: §4.1.
  • [24] S. Snodgrass and S. Ontanon (2016) An approach to domain transfer in procedural content generation of two-dimensional videogame levels. In Twelfth Artificial Intelligence and Interactive Digital Entertainment Conference, Cited by: §1, §1, §2.
  • [25] S. Snodgrass and S. Ontañón (2017) Learning to generate video game maps using Markov models. IEEE Transactions on Computational Intelligence and AI in Games. Cited by: §1, §2.
  • [26] S. Snodgrass (2019) Levels from sketches with example-driven binary space partition. In The 15th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), Cited by: §2, §3.1, §3.3.
  • [27] K. Sohn, H. Lee, and X. Yan (2015) Learning structured output representation using deep conditional generative models. In NeurIPS, Cited by: §4.2.2.
  • [28] A. J. Summerville, S. Snodgrass, M. Mateas, and S. Ontañón (2016) The VGLC: the video game level corpus. In Seventh Workshop on Procedural Content Generation at First Joint International Conference of DiGRA and FDG, Cited by: §3.1.
  • [29] A. Summerville and M. Mateas (2015) Sampling hyrule: sampling probabilistic machine learning for level generation. Cited by: §1.
  • [30] A. Summerville and M. Mateas (2016) Super Mario as a string: platformer level generation via LSTMs. Proceedings of 1st International Joint Conference of DiGRA and FDG. Cited by: §1, §2.
  • [31] A. Summerville, S. Snodgrass, M. Guzdial, C. Holmgård, A. K. Hoover, A. Isaksen, A. Nealen, and J. Togelius (2018) Procedural content generation via machine learning (PCGML). IEEE Transactions on Games. Cited by: §1, §2, §3.1.
  • [32] A. Summerville (2018) Expanding expressive range: evaluation methodologies for procedural content generation. In Fourteenth Artificial Intelligence and Interactive Digital Entertainment Conference, Cited by: 4th item, §5.1.
  • [33] G. J. Székely and M. L. Rizzo (2013) Energy statistics: a class of statistics based on distances. Journal of statistical planning and inference 143 (8), pp. 1249–1272. Cited by: 4th item.
  • [34] Tecmo (1988) Ninja Gaiden. Tecmo. Note: Game [NES] Cited by: §4.1.
  • [35] S. Thakkar, C. Cao, L. Wang, T. J. Choi, and J. Togelius (2019)

    Autoencoder and evolutionary algorithm for level generation in lode runner

    In IEEE Conference on Games, Cited by: §2.
  • [36] V. Volz, J. Schrum, J. Liu, S. M. Lucas, A. Smith, and S. Risi (2018) Evolving mario levels in the latent space of a deep convolutional generative adversarial network. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 221–228. Cited by: §2, §4.2.1, §4.2.3.
  • [37] X. Yan, J. Yang, K. Sohn, and H. Lee (2015) Attribute2image: conditional image generation from visual attributes. arXiv preprint arXiv:1512.00570. Cited by: §4.2.2.
  • [38] J. Zhu, T. Park, P. Isola, and A. Efros (2017) Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision, Cited by: §6.