Log In Sign Up

Generative Adversarial Network Rooms in Generative Graph Grammar Dungeons for The Legend of Zelda

Generative Adversarial Networks (GANs) have demonstrated their ability to learn patterns in data and produce new exemplars similar to, but different from, their training set in several domains, including video games. However, GANs have a fixed output size, so creating levels of arbitrary size for a dungeon crawling game is difficult. GANs also have trouble encoding semantic requirements that make levels interesting and playable. This paper combines a GAN approach to generating individual rooms with a graph grammar approach to combining rooms into a dungeon. The GAN captures design principles of individual rooms, but the graph grammar organizes rooms into a global layout with a sequence of obstacles determined by a designer. Room data from The Legend of Zelda is used to train the GAN. This approach is validated by a user study, showing that GAN dungeons are as enjoyable to play as a level from the original game, and levels generated with a graph grammar alone. However, GAN dungeons have rooms considered more complex, and plain graph grammar's dungeons are considered least complex and challenging. Only the GAN approach creates an extensive supply of both layouts and rooms, where rooms span across the spectrum of those seen in the training set to new creations merging design principles from multiple rooms.


page 1

page 2

page 5

page 6

page 8


Illuminating the Space of Beatable Lode Runner Levels Produced By Various Generative Adversarial Networks

Generative Adversarial Networks (GANs) are capable of generating convinc...

Bootstrapping Conditional GANs for Video Game Level Generation

Generative Adversarial Networks (GANs) have shown im-pressive results fo...

Attribute-conditioned Layout GAN for Automatic Graphic Design

Modeling layout is an important first step for graphic design. Recently,...

IAN: Combining Generative Adversarial Networks for Imaginative Face Generation

Generative Adversarial Networks (GANs) have gained momentum for their ab...

CPPN2GAN: Combining Compositional Pattern Producing Networks and GANs for Large-scale Pattern Generation

Generative Adversarial Networks (GANs) are proving to be a powerful indi...

Using Multiple Generative Adversarial Networks to Build Better-Connected Levels for Mega Man

Generative Adversarial Networks (GANs) can generate levels for a variety...

TOAD-GAN: Coherent Style Level Generation from a Single Example

In this work, we present TOAD-GAN (Token-based One-shot Arbitrary Dimens...

I Introduction

Video game developers increase replayability and reduce costs using Procedural Content Generation (PCG [11]). Instead of experiencing the game once, players see new variations on every playthrough. This concept was introduced in Rouge (1980), which procedurally generates new dungeons on every play. PCG is also applied to modern games like Minecraft (2009), where users play on generated landscapes, and No Man’s Sky (2016), where procedurally generated worlds contain procedurally generated animals. PCG encourages increased exploration and increases replayability.

An emerging PCG technique is Generative Adversarial Networks (GANs [6]) used to search the latent design space of video game levels, as has been done in Super Mario Bros. [17], Doom [5], an educational game [9], and the General Video Game AI (GVG-AI [10]) adaptation of The Legend of Zelda [14]. In the GVG-AI version of Zelda, single-room levels require the player to fight enemies, reach a key, and take it to the exit. The technique applied by Torrado et al. [14] to this game focuses on modeling non-local dependencies with the GAN in order to assure functional placement of the key and the exit door. Their work addresses a problem GANs have with learning level semantics, but the levels are restricted in scale based on the size of training instances.

This paper explores a new hybrid PCG approach for dungeon crawlers based on levels from the actual Legend of Zelda (1986). Specifically, a GAN generates rooms based on the Video Game Level Corpus (VGLC [12]) description of the game. To scale up to large dungeons with interesting challenges, rooms are organized into a dungeon using a generative graph grammar [4] which maps a high-level, human-designed mission to a sequence of room obstacles, and ultimately a complete dungeon. Combining the techniques creates new and interesting dungeons of arbitrary size.

This new technique (Graph+GAN) is evaluated with a human subject study. Thirty participants played three types of dungeons to compare the enjoyability, complexity, novelty, organization, and challenges of each through surveys. They played a dungeon from the original Legend of Zelda, a graph grammar dungeon with rooms from the original game, and a dungeon generated with the new Graph+GAN technique. Players ranked dungeons roughly the same in terms of most metrics. The exception is that GAN rooms were significantly less organized, and were considered most complex by a significant number of participants.

These findings show that this technique can generate levels similar to hand-crafted dungeons from The Legend of Zelda. However, these dungeons also contain unique new content, and an infinite amount such dungeons can be generated.

Ii Related Work

Procedural dungeon generation has been a topic of interest since Rouge was released in 1980. As more complex games were released, the idea of procedurally generating dungeons became more prevalent. The popular games in the Diablo series use PCG for generating dungeons, quests, and events. These features add variety and make these games more interesting and unpredictable, increasing replayability.

Procedural generation of dungeons has been widely studied in academia [16]. Some representative techniques include cellular automata [7], various evolutionary approaches [15, 1], and generative grammars [4, 3].

Dormans used a generative graph grammar to procedurally generate a dungeon mission, and a shape grammar to generate the dungeon itself [4]. Graph and shape grammars were further explored to generate dungeons similar to The Legend of Zelda: A Link to the Past (LttP) in an undergraduate thesis [8]. These dungeons required particular graph and shape grammars to produce results similar to LttP. Although new dungeon layouts were created, the rooms came from LttP rather than being generated from scratch.

A recent development is the use of GANs to model the latent design space of a level corpus. Volz et al. used a GAN to generate Super Mario levels with objective-based evolution [17]. A similar approach was later applied to Doom levels [5]

. A GAN can even be replaced with an autoencoder, as was done to evolve levels for

Lode Runner [13]. The approach worked in Mario despite a small data set, and the Doom and Lode Runner data sets were quite large.

However, for certain games it is hard to produce playable levels because of limited training data. This challenge was overcome by Park et al. with multiple GANs [9]: one GAN to create levels for a puzzle game from a small training set, and a second GAN using an augmented data set consisting of the original set plus levels from the first GAN that were actually solvable. Torrado et al. [14] used a similar approach, incorporating playable levels back into the training set when designing levels for the GVG-AI [10] version of Zelda.

In this paper, rather than make the GAN do more work, a division of labor is imposed. The GAN models the interior of individual rooms, and a generative graph grammar determines the dungeon layout and what items/obstacles are placed in each room. The result is a method that creates dungeons based off of The Legend of Zelda, described next.

Fig. 1: Dungeon 4-1 from Legend of Zelda converted to the Rouge-like engine. The goal of the dungeon is to reach the Triforce (triangle) in the top-right room. In the middle right, there is a blue # item, which is the raft. The raft allows players to cross one water tile (dark blue), which is necessary to traverse the room three spaces to the left of the room with the raft. In the original game, the raft room was underground (via stairs), and did not appear on the map. Due to limitations of the game engine for this study, the room was directly added to the map.

Iii The Legend of Zelda

The Legend of Zelda involves 18 dungeons across two quests (9 each) accessible via an overworld map. Each dungeon is composed of several rooms filled with enemies, items, and secret passages, where the end goal of each dungeon is find a Triforce, which completes the dungeon.

Each room is the same size. Although room layouts vary, many are reused both within and across dungeons. Rooms can be connected in a variety of ways: simple doors, doors requiring a key (Lock), doors that only open when all enemies in the room are defeated (Soft Lock), doors that open when a puzzle is solved (Puzzle), and passages that need to be bombed to open. These connections are always in a side wall of the room, though some dungeons have stairs to standalone rooms that are not part of the main map layout. Stairs are excluded from dungeons in this study.

There are many interesting items that can be collected in the game, but only a few are relevant to this paper: keys, hearts, bombs, and the raft item. Hearts allow the player to replenish their health. Bombs allow the player to blow up walls to reveal hidden doors or kill enemies. The raft item allows players to move across one water tile. It is introduced in Dungeon 4-1 (4th dungeon of Quest 1, Fig. 1) and used throughout the rest of the game.

Data about Zelda levels was obtained from the Video Game Level Corpus (VGLC [12]). This data provides text representations of the tiles present in each dungeon. Details of this representation, and how it maps to the one used in this paper, are in Table I. There are many symbols from the VGLC data, but since many of these tiles serve the same purpose as others, the tile training set is simplified.

Iv Dungeon Generation

A GAN is trained to generate individual rooms, which can then be combined into dungeons using a generative graph grammar. The 2D layout of the rooms is derived in part from the graph. To assure that the dungeon is beatable, some additional walls may need to be knocked down. Users can then play a Rogue-like game in the repaired dungeon.

Tile types come from VGLC, but many were unnecessary in the simplified Rogue-like engine used to play the levels. Thus the available tile set was reduced to three relevant types: floor, wall, and water. VGLC rooms were converted to use only these three tile types when serving as training input to the discriminator, and GAN outputs were used to make rooms using only these three tiles.

Tile type VGLC Game Rouge-like Rogue Type
Floor F Floor
Wall W Wall
Block B Wall
Door D Wall
Stair S Wall
Water P Water
Walk-able Water O Water
Water Block I Water
Monster Statue M Water
TABLE I: Tile Types Used in Generated Zelda Rooms.

Iv-a Zelda GAN

To generate Zelda rooms, the same GAN architecture/code used in Mario [17] is used (Fig. 2

). The only differences are a change in output size to accommodate a different tile type count, and a reduced latent vector size of 10 because initial experiments indicated that an unnecessarily large latent vector led to large areas in the latent space with little variation. The output width and height were maintained at

for backwards compatibility. Zelda rooms are only , but the GAN makes the surrounding space floor tiles.

Fig. 2: The Zelda GAN architecture.

This GAN can be trained on any 2D tile-based level representation. The generator takes latent vectors of noise from as input, and outputs a 3D volume of vectors of length . Each value in each vector corresponds to a tile type in Table I, and these vectors are collapsed so that the tile at its position in the resulting 2D image corresponds to the maximum value in the vector111Only three of the six values are used and the rest are ignored. Earlier versions of the GAN supported more tile types, and this setting was not changed after settling on three tile types. The upper-left portion of the image can then be interpreted as a Zelda room.

An additional discriminator network is also used during training. Its input is a one-hot encoded version of either a Zelda room from the training set, or fake output produced by the generator. Over the course of 10,000 epochs it is trained to make its single output

for real Zelda rooms and for generated rooms. The generator itself is trained along with the discriminator, to the point where it produces convincing fake Zelda rooms. After training, the discriminator performs no better than a coin toss, and is thus discarded.

To generate the training set, the 18 dungeons in VGLC were split into rooms and encoded as GAN inputs. Because there are many repeated rooms throughout the dungeons, duplicates were eliminated. Some tiles in Zelda have a similar function, but with a different aesthetic. Since this study did not prioritize aesthetics, some tiles were merged into one, as seen in Table I. The VGLC data incorrectly designates statue tiles as monsters, but the GAN interprets them as water tiles. Additionally, doors were removed from the training data, because doors need to be placed in accordance with the game mission defined by the graph grammar.

Iv-B Graph Grammar

A generative graph grammar [4] determines how rooms connect in a dungeon. The process starts with a designer-provided backbone graph representing the mission of a dungeon. The backbone includes specific rooms that must be present in the dungeon. The backbone used in this paper is Start Enemy Key Lock Enemy Key Puzzle Lock Enemy Triforce. The backbone is a sequence of non-terminal symbols that get replaced until only terminals remain. While this backbone is linear, the designer can implement any type of graph as the starting point. For each pair or single symbol there is a finite set of grammar rules defining what could replace it. For example, (Start Enemy) could be replaced with a starting room that has two adjacent empty rooms and one adjacent enemy room, which leads to the rest of the dungeon. Each rule defines a mini-graph that is placed into the backbone and can be made up of both non-terminals and terminals. An example of the iterative replacement process is in Fig. 3. This process can generate multiple graphs representing different dungeons, but ensures that the general layout stays the same.

Fig. 3: Graph Expansion Example. The first two nodes of the initial graph are replaced with a randomly chosen sub-graph defined by the available graph grammar rules. Non-terminal symbols are represented as capital letters (yellow) and terminals as lower case letters (blue). The process repeats until there are no non-terminals.

Each non-terminal symbol defines a type of room that must be in the dungeon, but during the generation process, edges connecting to non-terminal symbols get transformed to more elaborate sub-graphs that contain terminal representations of indicated rooms.

Symbol Short Description
Start S Dungeon starting room. Only one.
Enemy E Room with random number of enemies.
Nothing N No added content.
Key K Has enemies, and key appears after defeating them.
Lock L Has door that is unlocked by a key.
Soft Lock SL Has enemies, and a door that opens when they are defeated. Also contains raft item.
Puzzle P Has door that opens when puzzle block is pushed.
Triforce T Has Triforce. Dungeon complete once found. Only one.
TABLE II: Non-terminal Graph Grammar Symbols.

Non-terminals used by the grammar are in Table II. Not all symbols in the table are in the initial backbone, but can be added by grammar rules. The available rules assure that at least one Soft Lock room is in every dungeon, despite its absence from the backbone. Once a graph is created, the actual 2D layout of rooms must be determined.

Iv-C Dungeon Layout

Dungeon rooms are placed in breadth-first order beginning with the start room of the graph. However, there may not be space around a room to accommodate its neighbors. To ensure that all nodes are placed, the algorithm backtracks if no space is available around a room needing a neighbor.

First, a list of edges is generated in breadth-first order from the start room. This list is iterated through, and any node in an edge not yet placed is added to the dungeon. After the start room, all nodes must be placed in relation to the first node in an edge. A random position orthogonally adjacent to the previously placed node is chosen for its neighbor. Recursive backtracking is used, so whenever all surrounding positions of a node are occupied, the search undoes the last placement and attempts an alternative that has not yet been tried. The search continues until the list of edges is exhausted. Note that only the first occurrence of each node is placed. Although the graph represents the connectivity of rooms, the 2D layout typically loses edges present in the original graph. The layout attempts to match the graph as closely as possible (Fig. 4).

Every node corresponds to a room. Certain rooms are manipulated according to the grammar of the node. Enemy nodes randomly get 1–3 enemies placed in random locations. Key nodes have a key placed in a random empty spot in the room followed by randomly placed enemies. Lock nodes have a locked door placed at the connection leading to the next room. Soft Lock nodes have a soft locked door and randomly placed enemies. Additionally, the first soft locked room of the dungeon contains a raft item. Puzzle nodes have a door that can only be opened by finding and pushing a particular block in the room in a specific direction. A random spot in the room currently with or without a block becomes the puzzle block. Triforce node has a Triforce, represented as a yellow triangle, in the middle of the room. Bomb-able doors have a 40% chance of replacing a normal door; normal meaning that it is not locked, soft locked, or puzzle locked.

(a) Graph Representation of Dungeon

(b) Corresponding 2D Layout of Dungeon

Fig. 4: Creation of Dungeon From Graph. (a) Graph that represents a dungeon. Each node represents a room, and each edge represents a doorway between rooms. Symbols in each node indicate the type of obstacles present in the room. The graph is directed, but the player can go back and forth between rooms. The directed edges show how the player would encounter each room for the first time. (b) The complete generated dungeon based on the above graph, with specific room layouts determined by the GAN. This graph can be represented without loss of edges, but this is not possible with some graphs.

Iv-D Room Repair

To assure that each dungeon is beatable, some rooms are modified to create a path between certain points of interest. A* search is used to check that dungeons are beatable. The A* state representation tracks puzzle blocks, keys, and the raft item, but ignores enemies and always assumes there are sufficient bombs for bomb-able walls.

If A* fails to beat a dungeon, then one room is modified. Each room has points of interest (POIs): doors, keys, puzzle blocks, the raft, and the Triforce. A* tracks the visited and unvisited POIs. On search failure, a random unvisited POI is chosen along with a random visited POI in the same room (if there were no visited POIs, then two unvisited POIs are chosen). A modified Bresenham’s line algorithm [2] draws floor tiles from the visited POI to the unvisited POI. Puzzle blocks are a special case requiring POIs for both before and after the push. Afterward, A* resumes where it left off. This process repeats until A* beats the dungeon.

Iv-E Rogue-like Game

To interact with the dungeons, a Rouge-like game was created in Java using the AsciiPanel library by Trystans222 The Rouge-like game emulates the gameplay in Legend of Zelda. However, the game is turn-based and only features one enemy type. Many fancy items in Zelda are absent, but there are still bombs, and every level has a raft.

All actions are turn-based, so combat is simple. The player moves first and then the enemies. If an enemy is adjacent to the player, including diagonal to it, it will attack. Each enemy attack has a 50% chance of hitting and subtracting a heart from the player. The player can only attack enemies in orthogonally adjacent positions, by pressing the appropriate arrow key. When an enemy blocks the avatar’s movement, an arrow press is an attack instead of a move. When enemies are not adjacent to the player, they move toward it, but only if the player is within line of sight of 4 tiles. Otherwise, they move randomly. Enemies also move over water tiles.

Upon death, enemies sometimes drop a heart or a bomb. If the player with no bombs enters an empty room, enemies sometimes spawn so the player will be able to pick up bombs. There is at least one bomb-able wall in each dungeon.

V Human Subject Study

The method of dungeon generation described thus far (Graph+GAN) is evaluated by having humans compare it to two other types of dungeon: a graph grammar dungeon that does not use a GAN (Graph), and Dungeon 4-1 from Legend of Zelda (Original). Whenever a Graph dungeon places a room, it is chosen randomly from the set of all rooms in the VGLC training set. Graph and Graph+GAN dungeons seen by each participant were different. The Original dungeon played by every participant was Dungeon 4-1, because it is sufficiently interesting to represent a meaningful comparison. Some earlier dungeons are simplistic in comparison, and many later dungeons are so large that having users play them would be too time consuming. Dungeon 4-1 is also ideal because its raft item allows players to traverse obstacles in a new way, whereas many of the special items in other dungeons are weapons that introduce combat mechanics difficult to emulate in the Rogue-like engine.

The study had 30 participants (university students, faculty, and staff). Each participant played through a dungeon of each of the three types in a different order (5 per each of 6 possible orders). After each dungeon, the participant took a survey ranking the dungeon on a 1–5 scale in various categories. After the second dungeon, users indicated which of the two were better in various respects, and after the third dungeon all three were ranked relative to each other. Participants also provided open-ended text responses at each stage.

Players start each dungeon with 0 bombs, 0 keys, and 4 hearts. It was possible to die, in which case the user would start the dungeon over, but the game would be easier. The starting/max number of hearts would increase, as would the chance of defeated enemies dropping a heart pickup. After dying, the starting hearts would increase to 6, then 8, then 20. Unexpectedly, one participant did not finish one of the dungeons even with this many tries, and thus repeated the dungeon starting over at 4 hearts. The heart drop rate for defeated enemies started at 30%, and increased with each death to 60%, then 90% for the remaining deaths.

All source code for running the user trials is available as part of the MM-NEAT repository release 3.2 at:

Vi Results

Statistical analysis of numerical ratings and relative rankings is provided, as are objective measures of the novelty of rooms in each dungeon type. Qualitative user responses provide additional insight into the quantitative data.

Vi-a Numeric Participant Ratings

Fig. 5: Participant Ratings Of Each Dungeon Type. Violin plots depict distributions of participant ratings on a 1–5 scale for properties of each dungeon type. Each group of plots shows ratings of Original, Graph, and Graph+GAN

. The aspects being rated are under each group. Thicker regions indicate a larger number of ratings at the given number. White dots are median scores, and thick black rectangles range from the lower to upper quartiles. Thin black lines range from the minimum to maximum scores, unless there are outliers. For example, a single individual gave

Graph+GAN a low Enjoyment rating of 2, and a single individual gave Graph a low Room Organization rating of 1. Outliers aside, different dungeon types have comparable scores in most categories. The only category with a statistically significant difference is Room Organization: Graph+GAN rooms are less organized than others.

Graph and Graph+GAN dungeons are comparable to Original

in most respects. Kruskal-Wallis tests (

) indicate that there are no significant differences between dungeon types in terms of enjoyability (), challenge in finding the exit (), challenge from enemies (), map complexity (), room complexity (), and room novelty (). Only in terms of room organization is there a significant difference between dungeon types (), and post-hoc pairwise Mann-Whitney tests with FDR error correction show that it is specifically the Graph+GAN rooms that are less organized than rooms of both Original () and Graph (). Since Original and Graph make use of the same set of rooms, there is no significant difference in their level of organization (). Distributions of participant ratings for each dungeon type in all categories are shown as violin plots in Fig. 5.

Vi-B Relative Participant Rankings of Dungeons

Fig. 6: Participant Relative Rankings Of Each Dungeon Type. Stacked bar charts show the number of participants that assigned each dungeon type a particular rank with respect to each other in each category. Categories are listed along the bottom. Each bar shows the count that ranked the given dungeon type as Least, Middle, and Most from bottom to top in green, orange, and blue. Some notable observations are that 15 participants rated Original as most enjoyable, 17 rated Graph+GAN as most complex, and 16 rated Graph+GAN as most chaotic. In contrast, 14 rated Graph as least enjoyable, 15 rated Graph as least complex, and 17 rated Graph map layouts as least challenging.

After all three dungeons, participants ranked dungeons in terms of enjoyment, room complexity, room novelty, map layout challenge level, and chaos of the rooms (Fig. 6). For each category the number of Most and Least ratings for each dungeon type were compared using exact multinomial tests.

There is no significant difference in Most ratings in the categories of enjoyment (), room novelty (), map challenge (), or room chaos (

). The null hypothesis that was

not rejected is that the 30 user ratings are evenly split into 10 per dungeon type. Only for room complexity was there a significant difference between Most ranks (). Post-hoc pairwise binomial tests with FDR error correction indicate that Graph+GAN rooms received significantly more Most Complex ranks than Graph (17 vs. 3, ). However, despite Graph and Original rooms coming from the same set, there is no significant difference between the number of Most Complex ranks of Original vs. Graph+GAN (10 vs. 17, ). The difference between Original and Graph was also not significant (10 vs. 3, ).

For Least ranks, there was no significant difference in enjoyment (), room complexity (), room novelty (), or room chaos (). However, there was a significant difference in map challenge (). Specifically, post-hoc binomial tests with FDR correction indicate that Graph received significantly more Least Challenging ranks than Original (17 vs. 4, ). This finding is interesting because the layouts for Graph+GAN and Graph were defined by the same algorithm. However, the differences between Original and Graph+GAN (4 vs. 9, ) and Graph and Graph+GAN (17 vs. 9, ) were not significant.

Despite few distinctions being statistically significant, there are interesting non-significant differences. First, 15 users found Original most enjoyable. However, when compared with the 1–5 ratings in Fig. 5, it seems that the degree to which Original was more enjoyable was minor. In contrast, 16 participants found Graph+GAN rooms most chaotic. The 1–5 ratings for Room Organization relate to these responses, and indicate that GAN rooms may actually be moderately more chaotic/less organized. Original received the highest number of Most Novel ranks (13) and smallest number of Least Novel ranks (8) with respect to its rooms, whereas Graph received the most Least Novel ranks (13) and least Most Novel ranks (6). This contrast is strange because every room in Original is a room that could be in Graph dungeons. The GAN generated rooms, mostly unique to these dungeons, were ranked Most Novel 11 times and Least Novel 9 times. This confusion can be clarified with the objective measure of novelty presented next.

Vi-C Objective Novelty Comparisons

An objective calculation of room novelty was made to measure differences between dungeon types. Room Novelty is the average normalized distance of that room from all other rooms in its dungeon. The distance metric is the count of tile positions in which two rooms differ. Only the novelty of the primary floor area is considered (excluding walls and doors). Dungeon Novelty is the average novelty all rooms in the dungeon. Summary novelty statistics are in Table III.

Summary statistics of novelty scores for different collections of dungeons and rooms are shown. is the sample size. The first three rows are based on Dungeon Novelty, and the next six on Room Novelty. Calculations are performed across all rooms in the given collections, and across only the unique rooms. Original is less novel, unless you focus on unique rooms only.

Type Avg StDev Min Max
Original Dungeons 18 0.1311 0.3178
Graph Dungeons 30 0.1975 0.3759
Graph+GAN Dungeons 30 0.1970 0.3899
All Original Rooms 459 0.1545 0.6733
All Graph Rooms 491 0.2035 0.5920
All Graph+GAN Rooms 492 0.2108 0.5891
Unique Original Rooms 38 0.2453 0.6062
Unique Graph Rooms 87 0.2471 0.5334
Unique Graph+GAN Rooms 367 0.2337 0.5802
TABLE III: Objective Novelty Scores.

Comparing novelty scores of different dungeon types using one-way ANOVA reveals significant differences (). Post-hoc pairwise comparisons with Tukey’s HSD and error-adjusted -values are presented. Even though Graph only uses rooms from the original game, Graph is significantly more novel than Original (). Here, Original refers to all 18 dungeons from the original game. Graph+GAN is also significantly more novel than Original (), but not significantly different from Graph ().

The novelty of Dungeon 4-1 specifically is 0.2970, which is higher than the averages for all dungeon types; a very novel dungeon was used in this study. Users explicitly mentioned this: “I enjoyed that the layout was different in almost all rooms.” Fig. 1 verifies this, and indicates why users rated the novelty of this dungeon high, even though the set of all rooms in the original game has low novelty.

In addition to calculating novelty scores for each dungeon, averages across all rooms present in a given collection of dungeons can also be calculated (Table III). ANOVA indicates a significant difference between the room novelties of all rooms from the original game, all rooms in all 30 Graph dungeons, and all rooms in all 30 Graph+GAN dungeons (). Tukey’s HSD once again indicates that Graph and Graph+GAN are significantly more novel than Original (), but not significantly different from each other ().

Calculations on sets of only the unique rooms of each collection are also performed (Table III), because these collections have many repeated rooms, especially those in Original. Although Graph uses the same rooms, they are sometimes modified by the repair process (Section IV-D), so Graph has more unique rooms. Graph+GAN has the most unique rooms. When reduced to only unique rooms, there is no significant difference among types (), indicating that Original dungeons re-use certain rooms more heavily than the random sampling of Graph or the GAN output of Graph+GAN.

Vi-D Informative Participant Quotes

Quotes contextualize the quantitative findings. In particular, why was Dungeon 4-1 appealing? Participants enjoyed the water obstacle that was only passable with the raft. One said, “water cross tool/item was enjoyable.” Another said, “I liked that you had to wait later in the level to get the water walking thing and that helped you get further in the level.”

More generally, backtracking was appealing, as indicated about a Graph dungeon: “I liked the need to backtrack through a couple of the dungeon rooms for necessary items if you didn’t find them first.” However, for the graph backbone in this study, only some generated levels required the raft to be beaten. In others, players found the raft before needing it. One user said of a Graph dungeon, “I liked that this dungeon had rooms that used the raft more than the other; however, I got the raft early enough to where I didn’t have to worry about water.”

Better design of the graph backbone could enforce backtracking as in Dungeon 4-1. However, some users appreciated how expectations were subverted: “I liked that there was the raft item near the beginning of the dungeon that I could see but couldn’t reach. I felt like I had to figure out a way to get to the raft, but couldn’t.” Of a Graph+GAN dungeon: “This dungeon was very chaotic, with items you didn’t need in places you couldn’t access. I liked that a lot because it threw me off and had me thinking about different possibilities.”

Fig. 7: Spectrum of Rooms Generated by the GAN. Some are identical or nearly identical to rooms in the training set, but others seem less structured and predictable, thus showcasing the diversity of the GAN outputs, but also revealing why its rooms are sometimes considered chaotic and unorganized.

This quote supports data indicating that GAN rooms are less organized. A participant observed: “there were parts of rooms and enemies that I couldn’t reach.” This oddity could be avoided by restricting item and enemy placement to reachable locations. Reachability aside, many GAN rooms simply look more chaotic: “There was a large mix of wall and water blocks, in ways that didn’t seem completely natural. There was very little symmetry and a lot of obstacles.”

Although the GAN produces chaotic rooms, 10 participants specifically said things like “They seemed organized,” and “I felt like the rooms were organized.” The GAN also produces rooms from the original training set, and unique rooms that have a level of structure similar to original rooms (Fig. 7). Randomness led to some users seeing more rooms of one type than the other. Some people appreciated the chaotic rooms: “they were chaotic but in a good way, none seemed like a copy of the previous and kept me on my toes.”

Much criticism was directed at Graph dungeon layouts. Participants said, “I didn’t enjoy how simple the dungeon was overall,” “The map layout was very simple, not very novel,” and “this one favored simpler layouts.” Graph+GAN dungeons did not receive many comments like this, despite using the same graph grammar. Randomness in generation may have played a role, though it may be that chaotic GAN room layouts distracted from issues with the dungeon layout.

The most criticized layout was a linear layout without much branching: “just a diagonal line, not many choices,” and “It was not as difficult to make it from room to room due to the lack of multiple bordering squares.” These complaints could be remedied by having segments for the dungeon backbone with more diverse path options. However, the main issue seems to be randomness in the 2D layout, because some Graph dungeons had interesting layouts: “The map layout had me thinking of different areas the secret doors were in. It was interesting to try and figure out where to go next.”

Ultimately, conflicting opinions about several aspects of Graph and Graph+GAN dungeons are likely based partly on differing user preferences and perspectives, but are potentially also based on the variety of dungeons that can be produced by these methods, making it hard to categorize all dungeons of either type in the same way.

Vii Discussion and Future Work

The Graph+GAN technique presented in this paper procedurally generates dungeons similar in terms of enjoyment, challenge level, and complexity to Dungeon 4-1 from The Legend of Zelda. Dungeon 4-1 is special because it introduces the raft item which makes new types of puzzles possible. Creating dungeons comparable to this dungeon is impressive. Furthermore, the Graph+GAN technique can create an infinite multitude of such dungeons.

Improving the handcrafted backbone for the graph grammar could vastly improve layouts, remedying many user complaints. The dungeon generation method would be the same, but a better designer could encourage the method to produce better output. Tweaking the backbone requires relatively little effort, given that the benefit is an infinite multitude of levels. The backbone could be adjusted to force backtracking after obtaining the raft, and could provide any desired number of locked doors and/or puzzle rooms. Without a graph grammar, a designer can fix a specific level, but needs to expend great effort to create whole new levels adhering to a particular high-level design plan.

Both Graph and Graph+GAN techniques produce a multitude of levels, but Graph makes repetitive use of the same rooms. Even when it produces a layout as interesting as an Original level, it offers nothing new in terms of rooms. In contrast, GAN rooms are less organized, and considered most complex. Some users enjoyed the unpredictability of certain GAN rooms, but the GAN can also produce structured rooms similar to those from the original game.

In the future, it is desirable to have a data-driven method replace the graph grammar entirely. Whether GANs or some other method can be adapted for this purpose is uncertain. The variation across the 38 unique rooms in the original game (training set) seems less than the variation across the 18 dungeons. There is a combinatorial explosion of potential complexity in complete dungeons when the variety of possible rooms is taken into account, and 18 dungeons of very different sizes may not be enough for a GAN to learn general design principles. However, bootstrapping methods (that work with limited data) for applying GANs to level design are an area of active research [14, 9]. Generating the entire dungeon based on data will hopefully better capture design patterns of the original dungeons.

Though this paper shows the potential of the Graph+GAN approach, a more impressive example would utilize all details of Zelda’s levels, and create a gameplay experience closer to the original. Unfortunately, the VGLC data is lacking many details. However, the current GAN model could, without modification, generate rooms for a more intricate game if the gameplay engine were more complex.

Viii Conclusions

A new hybrid approach to generating game dungeons combining a Generative Adversarial Network with a Generative Graph Grammar was presented and validated with a user study. User responses indicate that results were comparable to a handcrafted level from The Legend of Zelda. Better design of the graph backbone, and a more sophisticated game engine could result in a more impressive experience. This new approach to Procedural Content Generation could prove valuable for commercial video games.


This research is supported in part by the Summer Collaborative Opportunities and Experiences (SCOPE) program, funded by various donors to Southwestern University.


  • [1] D. Ashlock, C. Lee, and C. McGuinness (2011) Search-Based Procedural Generation of Maze-Like Levels. Transactions on Computational Intelligence and AI in Games 3 (3), pp. 260–273. Cited by: §II.
  • [2] J. E. Bresenham (1965-03) Algorithm for Computer Control of a Digital Plotter. IBM Systems Journal 4 (1), pp. 25––30. External Links: ISSN 0018-8670 Cited by: §IV-D.
  • [3] J. Dormans and S. Bakkes (2011) Generating Missions and Spaces for Adaptable Play Experiences. Transactions on Computational Intelligence and AI in Games 3 (3), pp. 216–228. Cited by: §II.
  • [4] J. Dormans (2010) Adventures in Level Design: Generating Missions and Spaces for Action Adventure Games. In Procedural Content Generation in Games, External Links: ISBN 978-1-4503-0023-0 Cited by: §I, §II, §II, §IV-B.
  • [5] E. Giacomello, P. L. Lanzi, and D. Loiacono (2019) Searching the Latent Space of a Generative Adversarial Network to Generate DOOM Levels. In Conference on Games, Cited by: §I, §II.
  • [6] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative Adversarial Nets. In Neural Information Processing Systems, pp. 2672–2680. Cited by: §I.
  • [7] L. Johnson, G. N. Yannakakis, and J. Togelius (2010) Cellular Automata for Real-Time Generation of Infinite Cave Levels. In Procedural Content Generation in Games, External Links: ISBN 9781450300230 Cited by: §II.
  • [8] R. Lavender (2015) The Zelda Dungeon Generator: Adopting Generative Grammars to Create Levels for Action-Adventure Games. Technical report University of Derby. Note: Undergraduate Thesis Cited by: §II.
  • [9] K. Park, B. W. Mott, W. Min, K. E. Boyer, E. N. Wiebe, and J. C. Lester (2019) Generating Educational Game Levels with Multistep Deep Convolutional Generative Adversarial Networks. In Conference on Games, Cited by: §I, §II, §VII.
  • [10] D. Perez-Liebana, J. Liu, A. Khalifa, R. D. Gaina, J. Togelius, and S. M. Lucas (2019-Sep.) General Video Game AI: A Multitrack Framework for Evaluating Agents, Games, and Content Generation Algorithms. Transactions on Games 11 (3), pp. 195–214. External Links: ISSN 2475-1510 Cited by: §I, §II.
  • [11] N. Shaker, J. Togelius, and M. J. Nelson (2016) Procedural Content Generation in Games. 1st edition, Springer. External Links: ISBN 3319427148 Cited by: §I.
  • [12] A. J. Summerville, S. Snodgrass, M. Mateas, and S. Ontañón (2016) The VGLC: The Video Game Level Corpus. In Procedural Content Generation in Games, Cited by: §I, §III.
  • [13] S. Thakkar, C. Cao, L. Wang, T. J. Choi, and J. Togelius (2019)

    Autoencoder and Evolutionary Algorithm for Level Generation in Lode Runner

    In Conference on Games, pp. 1–4. Cited by: §II.
  • [14] R. R. Torrado, A. Khalifa, M. C. Green, N. Justesen, S. Risi, and J. Togelius (2019) Bootstrapping Conditional GANs for Video Game Level Generation. arXiv abs/1910.01603. External Links: 1910.01603 Cited by: §I, §II, §VII.
  • [15] V. Valtchanov and J. A. Brown (2012) Evolving Dungeon Crawler Levels with Relative Placement. In C* Conference on Computer Science and Software Engineering, pp. 27––35. External Links: ISBN 9781450310840 Cited by: §II.
  • [16] R. van der Linden, R. Lopes, and R. Bidarra (2014-03) Procedural Generation of Dungeons. Transactions on Computational Intelligence and AI in Games 6 (1), pp. 78–89. External Links: ISSN 1943-0698 Cited by: §II.
  • [17] V. Volz, J. Schrum, J. Liu, S. M. Lucas, A. M. Smith, and S. Risi (2018) Evolving Mario Levels in the Latent Space of a Deep Convolutional Generative Adversarial Network. In

    Genetic and Evolutionary Computation Conference

    Cited by: §I, §II, §IV-A.