Generating Interactive Worlds with Text

by   Angela Fan, et al.

Procedurally generating cohesive and interesting game environments is challenging and time-consuming. In order for the relationships between the game elements to be natural, common-sense has to be encoded into arrangement of the elements. In this work, we investigate a machine learning approach for world creation using content from the multi-player text adventure game environment LIGHT. We introduce neural network based models to compositionally arrange locations, characters, and objects into a coherent whole. In addition to creating worlds based on existing elements, our models can generate new game content. Humans can also leverage our models to interactively aid in worldbuilding. We show that the game environments created with our approach are cohesive, diverse, and preferred by human evaluators compared to other machine learning based world construction algorithms.


page 11

page 13


Generating Lode Runner Levels by Learning Player Paths with LSTMs

Machine learning has been a popular tool in many different fields, inclu...

Entity Embedding as Game Representation

Procedural content generation via machine learning (PCGML) has shown suc...

Toward Automated Quest Generation in Text-Adventure Games

Interactive fictions, or text-adventures, are games in which a player in...

Toward Co-creative Dungeon Generation via Transfer Learning

Co-creative Procedural Content Generation via Machine Learning (PCGML) r...

There is more to PCG than Meets the Eye: NPC AI, Dynamic Camera, PVS and Lightmaps

Procedural content generation (PCG) concerns all sorts of algorithms and...

TaikoNation: Patterning-focused Chart Generation for Rhythm Action Games

Generating rhythm game charts from songs via machine learning has been a...

New And Surprising Ways to Be Mean. Adversarial NPCs with Coupled Empowerment Minimisation

Creating Non-Player Characters (NPCs) that can react robustly to unfores...

1 Introduction

A large component of fantasy and science fiction literature is worldbuilding: putting together an elaborate context, with interesting (but believable) details, that can serve as a backdrop to a story (or for many stories). Successful worldbuilding requires common-sense knowledge about the real world and an understanding of the expectations of the audience.

Figure 1: Sample Constructed Game World. Models arrange locations, then populate them with characters and objects. Top predictions are shown. Table 1 shows the descriptions associated with the location Town of Anoria, placed in middle left of this generated world.

In this work, we present a machine learning (ML) approach to creating a cohesive and interesting world built from elements of the text-based fantasy game environment LIGHT [Urbanek et al.2019]. These crowd-sourced elements, including descriptions of locations, characters, and objects, provide a rich source of supervision for learning common-sense relationships. Previous work in LIGHT focused on static, single-location settings using the crowd-sourced data. Instead, we focus on creating full environments for players to explore. We show how ML algorithms can learn to assemble these different elements, arranging locations and populating them with characters and objects. We use models to learn how to answer questions such as: Where is the ornate trunk likely to be? What is likely to be inside it? Where is the knight likely to be? These considerations are necessary for building a cohesive game environment.

We demonstrate that our proposed models can construct rich game environments that are diverse and preferred by human evaluators. We also develop models to generate descriptions of new locations, characters, and objects. Finally, we demonstrate that these machine learning tools can aid humans interactively in designing new game environments.

Name: Town of Anoria
Description: Town of Anoria has lots of cobble stone
streets and wood houses of one story. […]
The town of Anoria is inland and takes a
long time to reach the sea, […]
Neighbors: Mountain’s Peak
Characters: townspeople, mysterious merchant
Objects: candle, backpack
Example location: Our model placed the Town of Anoria with an exit to the Mountain Peak, and placed characters and objects inside this location.
Character: Mysterious Merchant
Persona: I am the mysterious merchant of the village.
I sell rarities from around the world
that can not be purchased anywhere else. […]
Description: The merchant in town came and went
without a crack of the grass beneath his feet.
No one knew when he was gone, nor when
he returned home […]
Carrying: pouch, cane
Wearing: hat, coat
Wielding: dagger
Example character: Our model placed the Mysterious Merchant in the Town of Anoria, along with other townspeople.
Object Description Affordances
pouch The pouch is made of fine silk container
cloth, colored bright red. It has a gettable
leather string keeping it sealed.
cane The cane is made of a very gettable
uncommon, ornate wood.
dagger The dagger is curved gettable
with a golden hilt. weapon
Example objects: Pouch, Cane, and Dagger, all carried by the Mysterious Merchant.
Object Inside the Object
pouch coins, eyeglasses
backpack wallet, bedroll, tools
Example objects within container objects: Our model placed additional objects inside the Pouch and the Backpack.
Table 1: Game Elements include locations, characters, objects, and objects within containers. Elements have descriptions and annotations such as what a location contains.

2 Constructing Game Environments

In this section, we detail methods for learning to build game elements from compositions of sub-elements; and worlds from these elements.

2.1 Background on LIGHT

LIGHT is a multi-player text-based fantasy-themed virtual world. It consists of a set of crowd-sourced game locations, characters, and objects, and a game engine that controls the interactions between these. Characters can speak to each other via text, send emotes like grin or ponder, and take actions to move to different locations and interact with objects. Some example actions include go north, get shovel, or unlock door. The game engine represents the game state as a graph, and the actions by characters amount to operations on the graph. The locations, characters, and objects were crowd-sourced using Amazon Mechanical Turk. Crowd-workers were asked to provide names and descriptions for each of these aspects through natural language, for a total of 663 locations, 1755 characters, and 3462 objects. See Table 1 for examples and Figure 1 to see how our work combines the elements into a playable game environment.

2.2 Building a Game World

urbanek2019learning focused on modeling character dialogue and action in pre-built locations. ML models were trained to play the game by mimicking the actions and dialogues of human players in fixed settings built by crowd-workers. In contrast, in this work, we study models for assembling the game itself rather than agents that play it. Since these elements were separately crowd-sourced, we can compose them to create a large number of different game environments.

We describe our approach from the top down. First, in Section 2.3 we discuss connecting pre-built locations together to form a world. In 2.4, we give our methods for filling a location with characters and objects. We describe the additional data collected to model objects contained within other objects. Next, in 2.5, we discuss how to generate new game elements. In Section 2.6 we describe how these methods can be modified and utilized for interactive world-building. Finally, in Section 2.7, we describe how to bring these models together to create a new game world.

2.3 Building Maps by Arranging Locations

We describe our method to train machine learning models to arrange locations in LIGHT.

Locations in LIGHT

Each location represents a place with a name and description. The description provides background information about the location and what a player might see as they enter it. Crowd-workers provided examples of neighboring locations, as well as what characters and objects would be present within the location.

Using Machine Learning to Place Locations

Game locations must be spatially arranged so as to create a logical and cohesive environment for players to explore. For example, the Wizard’s Reagent Room being located near the Wizard’s Tower would make a more intuitive game experience compared to locations being randomly placed.

To train models for this task, we use the example neighbors for each location provided by crowd-workers, obtaining triplets of (location name, location description, location neighbors). We partitioned this into a training, validation, and test set such that the locations are distinct in each set (see Table 2). As each location can have multiple neighbors, the individual datapoints available for the prediction task is larger than the number of total locations collected.

We consider a variety of different ranking models for this task, in two settings. In the first, models have access to the location name only, and in the second, they additionally have access to the location description information. These models compare the human annotation of neighboring locations with a variety of negative candidates. These negative candidates can be thought of as distractor locations from the dataset that the model must distinguish from the human annotated location, similarly to how negative training data is sampled in the knowledge base population literature [Bordes et al.2013]. Models are trained to maximize the score of the human response and minimize the scores of the negative candidates. When constructing a new world at test time, the placed location is the highest scoring candidate from the model prediction. We use two machine learning approaches:

  • Starspace: The Starspace [Wu et al.2018] model learns a bag-of-words embedding for the location information (e.g. name and description). The model encodes the location information as well as the negative candidates, and trains to maximize the inner product of the true human annotation. We initialized the Starspace model using fasttext

    , a method for learning vector representations of individual words. This initialization allows the model to begin training with a better understanding of the text.

  • BERT-based Models: Recent work [Devlin et al.2019]

    in natural language processing has shown strong performance of the BERT model, which learns to encode text in a left-to-right and right-to-left fashion by training on large quantities of text data available online. We use the BERT-based models proposed in

    [Urbanek et al.2019, Humeau et al.2019] to encode the location information and the negative candidates. We explore two variants:

    (1) Bi-Encoder, which encodes the candidates and input context separately. This model scores the candidates by calculating the dot product between these embeddings.

    (2) Cross-Encoder, which concatenates the context with each candidate before encoding, allowing this model to build a context-dependent representation of each candidate. This model scores candidates by projecting the vector representation of text to a scalar.

As we have a limited quantity of data for the task, we found that using input dropout to prevent overfitting was crucial for good performance for both of these models.

We compare these models that learn from the training data with three baselines:

  • Random: We report a random baseline that selects a random candidate from the provided negative candidates.

  • Data Proportional: Instead of selecting candidates fully at random from the provided negative candidates, we select proportional to the number of times that candidate appears in the training set. This leverages the data annotation information and reflects that some candidates are more likely to be used than others.

  • Information Retrieval: This model selects the candidate with the largest word overlap using TF-IDF weighting.

To create a map for a new game, models are used to predict the neighboring locations of each existing location. For each new location added, the model will fill in the surroundings. A location can connect to up to four neighboring locations, though not all connections need to be filled. To make the game environment more interesting and diverse, locations cannot appear multiple times in one map (e.g. Berka’s Forest Inn is only located in one place).

Adding Filler Locations

A challenge with using crowd-sourced data for all of the locations is that crowd-workers often write exciting and complex locations. However, when players explore the game environment, this tendency leads to each location being complex and overwhelming. To remedy this, we create a set of 25 filler locations such as abandoned shack, empty closet, and storage room that provide additional content between the exciting locations that crowd-workers described. Filler locations can appear multiple times (e.g. there can be multiple empty closets).

2.4 Adding Characters and Objects to Locations

We describe how to apply our methods to add characters and objects to predicted locations.

Characters in LIGHT

Each character is described by a name, persona, and a description. The persona provides information about the character, such as their background and motivation, while the description describes the character’s appearance. LIGHT also has annotations of objects characters would carry, such as a Wizard holding a staff.

Objects in LIGHT

Each object represents an item that characters can interact with, such as get shovel. Objects have a name, a description, and a set of affordances. The description lists what the object looks like and what it might be able to do. The affordances represent object properties, such as gettable and drinkable. These are used by the game engine to determine the set of possible interactions of the object. For example, objects with the drinkable affordance can be interacted with using the action drink. Objects can be inside other objects, to represent for example coins inside a wallet. We crowd-sourced additional annotations of object size and examples of other objects that could be inside.

Using Machine Learning to Place Characters and Objects

Using the characters and objects associated to locations from LIGHT as ground-truth, we create training, validation, and testing data (see Table 2) to fit models to place characters and objects in locations, as well as object within objects. To collect objects within objects data, crowd-workers were given an object with the container affordance and asked to name multiple objects that could be inside.

We place characters and objects using the models described in Section 2.3. Here, instead of predicting neighboring locations, models are given locations and trained to predict characters and objects, or given objects and trained to predict which objects could be inside. For example, the character prediction task would receive as input the location Wizard’s Reagent Room and predict Wizard. As the amount of data for each task is low, we employ multi-task learning and train all of the tasks (locations, characters, objects, and containers) together to increase the quantity of training data.

Split Train Valid Test
Locations 914 109 110
Characters 529 305 305
Objects 359 318 256
Object Containers 359 318 256
Table 2: Dataset Statistics for World Generation: arranging locations next to each other, placing characters and objects within locations, and placing objects within objects.

2.5 Generating New Game Elements

Adding new elements to the existing LIGHT game is complex: descriptions, object affordances, character personas, and other details would need to be written. Instead, we propose using generative machine learning models to create additional content based on the name of the new item (either a location, a character, or an object). We use the same training, validation, and testing splits used in the world construction task (see Table 2). These generated items can be added to the game environment, so newly generated game worlds can incorporate them along with existing crowd-sourced elements.

We use the Transformer [Vaswani et al.2017] neural network architecture to create a Sequence-to-Sequence model to make the following predictions:

  • Given location name, predict background and description

  • Given character name, predict persona and description

  • Given object name, predict description and affordances

We compare the Transformer in two settings: with and without pretraining. As the dataset for generating new game elements is small, the generative model can be trained on a larger corpus and finetuned on this task. We use a large dataset of 2 billion Reddit comments for pretraining. Reddit comments are chosen because they are close to natural human conversation and exhibit elements of creativity and story-telling that may help generate interesting descriptions.

To be able to handle new vocabulary and ease learning, we use byte-pair encoding [Sennrich, Haddow, and Birch2016] to model subwords. Similar to Section 2.4, we multi-task prediction location, character, and object description, location background, and character persona with one model. We use top- sampling [Fan, Lewis, and Dauphin2018] to reduce repetition during generation. Object affordances are predicted with a separate model, as multi-label classification between seven possibilities is distinct from the other tasks.

2.6 Aiding Human Game Design

Machine learning models can be applied to automatically generate game environments for players, but they can also be used to aid humans in game design. Many existing game engines assist in fast and intuitive creation of different worlds already, for example providing level design tips or improving pathfinding [Graham, McCabe, and Sheridan2003]. Our methods can be used to automatically suggest neighboring locations or which characters and objects to place in the existing locations, speeding up world design.

2.7 Proposed Algorithm for World Generation

How do we use our proposed models collectively to make a new game world? First, an empty map grid is initialized to represent the number of possible locations. A percentage of grid positions are marked inaccessible to make exploration more interesting. The central location is populated randomly. We use the best performing model to iteratively fill in neighboring locations until the entire grid is populated. Then, for each placed location, the model is used to predict which characters and objects should populate that location. Finally, the model is used to predict if objects should be placed inside existing objects. Figure 1 displays an example generated world, with model predictions shown for missing elements. See Appendix for further details.

In an interactive setting where players are able to design their own worlds, we use models to provide suggestions for which elements to place. If players enter names of game elements not present in the dataset, our generative models are used to write descriptions, personas, and affordances.

3 Related Work

Figure 2: Frequency of Location Placement in 5000 generated game environments using our models.
Figure 3: Distribution of Locations, Characters, and Objects in 5,000 generated maps. Our method generates fairly large maps (the maximum size is set to 50) and places 1-3 characters and objects in each location.
Figure 4: Number of Different Locations, Characters, and Objects as a Function of Generated Maps. As additional maps are generated, a greater diversity of game elements appears. The orange line denotes the total number of elements in the dataset.

Procedural Content Generation in Games

Using algorithms to aid game generation is a growing field as the popularity of gaming rises. Recent work has made progress on level design in various game settings [Guzdial and Riedl2018, Khalifa et al.2016, Summerville et al.2016, Van der Linden, Lopes, and Bidarra2013, Vara2014], including rhythm games [Lin, Riedl, and Xiao2019], physics games [Stephenson and Renz2016], dungeon exploration [Shaker et al.2016], and social games [Risi et al.2012].

Much prior work has focused on the task of level generation, but there are other facets of games that could be generated. For example, weapons, various items, and characters are present in game levels [Liapis, Yannakakis, and Togelius2014, Liapis et al.2018]. We focus on how the various facets could fit together within a text-based game, and how we can use them to generate an entire game environment.

Text-Based Games

Many settings for content generation in text-based games have been explored. For example, barros2016murder barros2016murder use text from Wikipedia to link various entities for the generation of murder mystery games. ammanabrolu2018playing ammanabrolu2018playing represent a text-based adventure game as a graph and learn how to adventure within this world. Work has been done to generate Sporcle-like textual quizzes.111 However, designing generative algorithms to create a full environment for a multi-player text game has not been deeply explored.

Generation using Machine Learning

Generative modeling is an important topic in machine learning, outside of games or other creative endeavors. Recent works have demonstrated impressive models of images [Karras et al.2018] and text [Radford et al.2019]

. Recently, statistical ML has also been proposed for creative endeavors. For example gatys2016image show how users can manipulate the style of images using convolutional neural networks and

[Zhu et al.2017, Sbai et al.2018] describe ML-aided fashion design. There has been work in ML for music generation, see e.g. Magenta222 or briot2017deep briot2017deep for a survey. Most related to our world construction are methods for generating stories, poetry, and scripts [Fan, Lewis, and Dauphin2018, Ghazvininejad et al.2016, Janghorbani et al.2019, Marti et al.2018].

Work in content generation with machine learning has incorporated human guidance. For example, several works incorporate human control such as length and style to improve summarization, dialogue, and text simplification [Fan, Grangier, and Auli2018, See et al.2019, Martin et al.2019]. wang2018high wang2018high generates portions of images after human editing.

4 Evaluation and Results

We discuss several evaluations of both elements and worlds, compare methods, and discuss their successes and failures.

4.1 Diversity of Generated Worlds

Our proposed method can be used to automatically create a variety of diverse game worlds. We generate 5,000 worlds with a maximum size of 50 arranged locations and analyze these generations to understand the properties of created game environments.


The generated maps are very diverse. Figure 4 shows the number of map generations required to generate the full number of locations in the dataset. With 500 generations, a large majority of different locations have been used. Over 95% of locations in the dataset are used after 5000 generations. The most commonly placed location is the king’s quarters, in 34% of the generated worlds (see Figure 2). Some locations are used very sparingly, such as the brim canal (0.06% of the worlds). Allowing our modeling approach to decide the map size, 80% of the generated worlds have more than 30 locations (see Figure 3) and about 40% of the worlds have the maximum number locations. Example generated maps are shown in the Appendix.


Around 65% of characters in the dataset are generated after 5000 maps (Figure 4). The lower coverage is most likely because there are very specific characters created by crowd-workers that are not scored highly by models, and thus not often placed. To provide a concrete example, a specific qualified character might be in the dataset, such as an old, wizened priestess, but if that character is only mentioned once in the training set, a model might score a more generic character higher, such as priestess. The maximum number of characters placed in one room is around 15, but most locations have 0-3 characters present (see Figure 3).333ML approaches are known to reflect data biases [Zhao et al.2019, Brunet et al.2019]. We found that there are a greater number of male characters in LIGHT, and this is reflected in the generated environments. We plan to investigate this in a follow-up work.


Similar to characters, around 60% of objects in the dataset are generated after 5000 maps, shown in Figure 4. Some locations contain a large number of objects, such as the Treasure Chamber, but most locations contain about 1-3 objects that players can interact with (Figure 3).

Feature Model Locations Characters Objects Containers
Random 8.2 5.9 5.9 5.7
Data Proportional 0.0 9.8 20.1 5.9
Name Only Information Retrieval 18.2 7.5 8.2 9.6
Fasttext 9.1 12.8 15.6 27.4
Starspace 44.5 17.7 13.3 20.1
Name and Description Information Retrieval 30.0 19.0 21.9
Fasttext 28.2 17.0 16.8
Starspace 45.5 35.7 47.3
BERT Bi-Encoder 30.2 30.2 34.0
BERT Cross-Encoder 28.2 36.1 35.5
Table 3: Comparison of Various Approaches to Worldbuilding. We report Hits at 1 on the test set for arranging locations and populating with objects, characters, and placing objects within container objects. Starspace models perform well on all tasks.

4.2 Quality of Generated Worlds

Automatic Evaluation

We first use automatic evaluation to compare the quality of different machine learning approaches to the location, character, object, and container prediction tasks. We measure Hits at 1, or the percentage of time the correct candidate is ranked first amongst the negative candidates. If the model always predicted what the crowd-workers annotated, then this metric would have the value 100. Containers are evaluated in the Name only variant — as crowd-workers were able to write any object, not all of their written choices have descriptions.

Results are shown in Table 3. Leveraging the data distribution to weight random sampling provides a strong baseline for characters and objects, as a few are quite common. Providing the description text is helpful for improving prediction quality compared to having access only to the name feature. Amongst the various approaches, Starspace models show strong performance, particularly on the location prediction task. The Bi and Cross Encoder models are very large neural networks and may be overfitting on the much smaller LIGHT world creation training data. Further, they are pretrained on non-domain specific data, which may negatively impact performance.

Human Assessments

We conduct human evaluation to compare the various approaches to world generation. We compare the performance of two models pairwise by starting in the same location and using the models to iteratively predict subsequent locations. Locations are then populated with characters and objects. After each location, human evaluators are asked which model was able to place more logical and interesting characters and objects. After five steps through predicted locations, human evaluators are asked which model path they prefer as more natural, cohesive, and interesting. We compare four different approaches:

  • Random: Locations, characters, and objects were randomly selected from the set of all possible datapoints.

  • Starspace: The model described in Section 2 was used to predict which locations should be linked in the path, and the characters and objects present in each location.

  • Data Created Paths: This method uses the existing dataset of annotated locations and their neighbors to construct a path. The characters and objects present in each location are from the original crowd-sourcing tasks. In contrast to Starspace, the number of possible paths that could be created with existing data is limited. For example, if a room has only one annotated neighbor, it would always be arranged in the same manner.

  • Human Annotated Paths: Human evaluators constructed paths by manually linking locations. Here a single evaluator created an entire path — in contrast, in Data Created Paths, annotators during initial data collection only provided a one-step neighbor, rather than one person creating the entire path. The characters and objects are from the original crowd-sourcing. While human-created paths could be high quality, such a method does not scale to large worlds as it is costly and time consuming.

As shown in Figure 5, human evaluators prefer Human Annotated Paths the most, but Starspace prediction models perform strongly as well. Starspace is strongly preferred over Random and the predicted location paths are preferred over Data Created Paths over 60% of the time.

Figure 5: Human Evaluation of World Construction. The number indicates percentage preference, with the arrow pointing to the winner. The first row in each box is location preference, second character, and third object.
Figure 6: Machine Learning Models Aiding Human Game Design. Having access to model suggestions for which elements to place next allows evaluators to create fun and interesting game environments more quickly.
Model Location Character Object Affordance
- Pretrain 23.3 28.4 32.5 57.0
+ Pretrain 26.3 29.3 38.8 62.6
Table 4: Comparison of Game Element Generation. We report F1 on the test set.
Given Location: The forest
Generated This forest has been a part of the
Background: kingdom for centuries. It is a great
resource to have when needed. Most
people consider it a necessity to have a
peaceful and peaceful environment.
Generated The forest is filled with wild animals
Description: that live in the forest, but are unable
to climb up trees. It is clear the trees are
not in use.
Given Character: A Princess
Generated I love to help people and I love my
Persona: family. I spend my days fighting,
protecting my people and my children.
I love to help the royal family!
Generated She is a fierce looking princess that has
Description: been trained to protect any people.
Given Object: wooden sword
Generated This wooden sword is worn down and
Description: damaged, to say the least, it is a sword
that would be useful for a knight’s duty.
Generated gettable
Affordances: weapon
Table 5: Generations of New Game Elements, given the desired name. Examples are chosen from the test set.

4.3 Generation of New Game Elements

To evaluate the quality of automatically generating new game elements using our proposed models, we compare F1, a metric of word overlap. For this metric, the text is lowercased and the overlap between tokens is computed. Pretraining increases the performance on all generation tasks, as shown in Table 4. For character descriptions and personas, the effect of pretraining is minimal. We hypothesize this is due to the slightly more templated nature of written personas, as many begin with I am a. Example generations are shown in Table 5

. Our generative models are able to write interesting, new, and generally coherent descriptions for a variety of different game elements (see Appendix for additional examples). We analyze the n-gram overlap of our generated game elements with the training set to understand how much of the written text is novel. We find that 34% of generated 3-grams are present in the training set (largely common phrases), but only 2.5% of generated 5-grams are present in the training set. As we are generating text with top-k sampling, the models do not tend to copy long sequences.

4.4 ML-aided interactive world creation

To quantify if models can aid players in designing their own worlds, evaluators designed a nine-location game environment. Evaluators were explicitly told the goal was to make a text-based game interesting and fun. To add game elements, they have access to a search bar with autocomplete, so they can type what they wish to place and select from a list (see Appendix for an image of the user interface). Half of the evaluators have access to model predictions, which are surfaced as suggestions at the top of the search dropdown. However, they can choose to ignore the suggestions.

Evaluators created 10 game environments with access to model suggestions and 10 without. While they reported similar satisfaction with the diversity and quality of their generated worlds in both settings, the amount of time spent was different. Evaluators spent 10 minutes or less to create maps with suggestions and 10-20 minutes without suggestions (Figure 6, top left). Those with suggestions said they would want to play an actual game in their created world more (Figure 6, top right). Finally, evaluators had a positive reaction to model suggestions (Figure 6, bottom): 100% of evaluators agreed that suggestions made it faster to create a world and definitely would want to have them again, 80% said they often chose from suggestions, and 90% said the suggestions were diverse. Freeform feedback was positive, with comments such as suggestions foster creativity and especially for characters, the suggestions showed what I wanted. Additional results are shown in the Appendix.

5 Conclusion

We proposed a method to procedurally generate game environments by using machine learning algorithms to arrange locations, place characters and objects within those locations and objects within containers, and write descriptions for new game elements. We explored different neural network based models for these tasks, and show with various automatic metrics and human studies that the maps generated by our approach are cohesive, interesting and diverse. Finally, we show that our machine learning approach can be used to aid humans in creating game worlds as well. Together, these steps show a path to creating cohesive game worlds from crowd-sourced content, both with model-assisted human creation tooling and fully automated generation.


  • [Ammanabrolu and Riedl2019] Ammanabrolu, P., and Riedl, M. 2019.

    Playing text-adventure games with graph-based deep reinforcement learning.

    In NAACL-HLT, 3557–3565.
  • [Barros, Liapis, and Togelius2016] Barros, G. A.; Liapis, A.; and Togelius, J. 2016. Murder mystery generation from open data. In Proceedings of the International Conference on Computational Creativity.
  • [Bordes et al.2013] Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; and Yakhnenko, O. 2013. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, 2787–2795.
  • [Briot, Hadjeres, and Pachet2017] Briot, J.; Hadjeres, G.; and Pachet, F. 2017. Deep learning techniques for music generation - A survey. CoRR.
  • [Brunet et al.2019] Brunet, M.; Alkalay-Houlihan, C.; Anderson, A.; and Zemel, R. S. 2019. Understanding the origins of bias in word embeddings. In ICML.
  • [Devlin et al.2019] Devlin, J.; Chang, M.; Lee, K.; and Toutanova, K. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, 4171–4186.
  • [Fan, Grangier, and Auli2018] Fan, A.; Grangier, D.; and Auli, M. 2018. Controllable abstractive summarization. In

    ACL Workshop on Neural Machine Translation and Generation

  • [Fan, Lewis, and Dauphin2018] Fan, A.; Lewis, M.; and Dauphin, Y. 2018.

    Hierarchical neural story generation.

    In ACL.
  • [Gatys, Ecker, and Bethge2016] Gatys, L. A.; Ecker, A. S.; and Bethge, M. 2016. Image style transfer using convolutional neural networks. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , 2414–2423.
  • [Ghazvininejad et al.2016] Ghazvininejad, M.; Shi, X.; Choi, Y.; and Knight, K. 2016. Generating topical poetry. In EMNLP, 1183–1191.
  • [Graham, McCabe, and Sheridan2003] Graham, R.; McCabe, H.; and Sheridan, S. 2003. Pathfinding in computer games. The ITB Journal 4(2):6.
  • [Guzdial and Riedl2018] Guzdial, M., and Riedl, M. 2018. Automated game design via conceptual expansion. In

    Fourteenth Artificial Intelligence and Interactive Digital Entertainment Conference

  • [Humeau et al.2019] Humeau, S.; Shuster, K.; Lachaux, M.-A.; and Weston, J. 2019. Real-time inference in multi-sentence tasks with deep pretrained transformers. arXiv preprint arXiv:1905.01969.
  • [Janghorbani et al.2019] Janghorbani, S.; Modi, A.; Buhmann, J.; and Kapadia, M. 2019. Domain authoring assistant for intelligent virtual agent. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 104–112. International Foundation for Autonomous Agents and Multiagent Systems.
  • [Karras et al.2018] Karras, T.; Aila, T.; Laine, S.; and Lehtinen, J. 2018. Progressive growing of gans for improved quality, stability, and variation. In ICLR.
  • [Khalifa et al.2016] Khalifa, A.; Perez-Liebana, D.; Lucas, S. M.; and Togelius, J. 2016. General video game level generation. In

    Proceedings of the Genetic and Evolutionary Computation Conference 2016

    , 253–259.
  • [Liapis et al.2018] Liapis, A.; Yannakakis, G. N.; Nelson, M. J.; Preuss, M.; and Bidarra, R. 2018. Orchestrating game generation. IEEE Transactions on Games 11(1):48–68.
  • [Liapis, Yannakakis, and Togelius2014] Liapis, A.; Yannakakis, G. N.; and Togelius, J. 2014. Computational game creativity. Citeseer.
  • [Lin, Riedl, and Xiao2019] Lin, Z.; Riedl, M.; and Xiao, K. 2019. Generationmania: Learning to semantically choreograph. In Proceedings of the 2nd Workshop on Knowledge Extraction from Games.
  • [Marti et al.2018] Marti, M.; Vieli, J.; Witoń, W.; Sanghrajka, R.; Inversini, D.; Wotruba, D.; Simo, I.; Schriber, S.; Kapadia, M.; and Gross, M. 2018. Cardinal: Computer assisted authoring of movie scripts. In 23rd International Conference on Intelligent User Interfaces, 509–519. ACM.
  • [Martin et al.2019] Martin, L.; de la Clergerie, E.; Sagot, B.; and Bordes, A. 2019. Controllable sentence simplification.
  • [Radford et al.2019] Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; and Sutskever, I. 2019. Language models are unsupervised multitask learners.
  • [Risi et al.2012] Risi, S.; Lehman, J.; D’Ambrosio, D. B.; Hall, R.; and Stanley, K. O. 2012. Combining search-based procedural content generation and social gaming in the petalz video game. In Eighth Artificial Intelligence and Interactive Digital Entertainment Conference.
  • [Sbai et al.2018] Sbai, O.; Elhoseiny, M.; Bordes, A.; LeCun, Y.; and Couprie, C. 2018. Design: Design inspiration from generative networks. In Proceedings of the European Conference on Computer Vision (ECCV), 0–0.
  • [See et al.2019] See, A.; Roller, S.; Kiela, D.; and Weston, J. 2019. What makes a good conversation? how controllable attributes affect human judgments. In NAACL-HLT, 1702–1723.
  • [Sennrich, Haddow, and Birch2016] Sennrich, R.; Haddow, B.; and Birch, A. 2016. Neural machine translation of rare words with subword units. In ACL.
  • [Shaker et al.2016] Shaker, N.; Liapis, A.; Togelius, J.; Lopes, R.; and Bidarra, R. 2016. Constructive generation methods for dungeons and levels. In Procedural Content Generation in Games. Springer. 31–55.
  • [Stephenson and Renz2016] Stephenson, M., and Renz, J. 2016. Procedural generation of levels for angry birds style physics games. In Twelfth Artificial Intelligence and Interactive Digital Entertainment Conference.
  • [Summerville et al.2016] Summerville, A.; Guzdial, M.; Mateas, M.; and Riedl, M. O. 2016. Learning player tailored content from observation: Platformer level generation from video traces using lstms. In 12th Artificial Intelligence and Interactive Digital Entertainment Conference.
  • [Urbanek et al.2019] Urbanek, J.; Fan, A.; Karamcheti, S.; Jain, S.; Humeau, S.; Dinan, E.; Rocktäschel, T.; Kiela, D.; Szlam, A.; and Weston, J. 2019. Learning to speak and act in a fantasy text adventure game. In EMNLP.
  • [Van der Linden, Lopes, and Bidarra2013] Van der Linden, R.; Lopes, R.; and Bidarra, R. 2013. Designing procedurally generated levels. In 9th Artificial Intelligence and Interactive Digital Entertainment Conference.
  • [Vara2014] Vara, C. F. 2014. Creating dreamlike game worlds through procedural content generation. In Seventh Intelligent Narrative Technologies Workshop.
  • [Vaswani et al.2017] Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. NIPS.
  • [Wang et al.2018] Wang, T.-C.; Liu, M.-Y.; Zhu, J.-Y.; Tao, A.; Kautz, J.; and Catanzaro, B. 2018. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition, 8798–8807.
  • [Wu et al.2018] Wu, L. Y.; Fisch, A.; Chopra, S.; Adams, K.; Bordes, A.; and Weston, J. 2018. Starspace: Embed all the things! In Thirty-Second AAAI Conference on Artificial Intelligence.
  • [Zhao et al.2019] Zhao, J.; Wang, T.; Yatskar, M.; Cotterell, R.; Ordonez, V.; and Chang, K. 2019. Gender bias in contextualized word embeddings. In NAACL-HLT, 629–634.
  • [Zhu et al.2017] Zhu, S.; Urtasun, R.; Fidler, S.; Lin, D.; and Change Loy, C. 2017. Be your own prada: Fashion synthesis with structural coherence. In Proceedings of the IEEE International Conference on Computer Vision, 1680–1688.

6 Appendix

6.1 Pseudocode for Assembling a LIGHT World

In Algorithm 1, we denote in detail how to create a new, playable game environment for LIGHT using our proposed methods.

6.2 Model Details


Models were trained with embedding size 128 and embedding norm 10, initialized with fasttext embeddings. We trained with learning rate 0.01 and input dropout 0.5. We modeled a vocabulary of 10749 tokens.


Bi-Encoder and Cross-Encoder models leverage the BERT model [Devlin et al.2019]. We finetuned them on the LIGHT tasks by warming up for 200 updates. We truncate the contexts and labels to 300 tokens. We train with input dropout 0.5.

Generative Transformer

contains 8 encoder layers and 8 decoder layers with 16 attention heads and 2048 FFN size. We model a BPE-based vocabulary of 54940 tokens. The text is truncated at 512 tokens. We optimize perplexity using Adam.

6.3 ML-Aided Game World Creation

The user interface that evaluators had access to is depicted in Figure 7. Evaluators were shown a grid of nine locations, with the center location populated randomly. This was done to give evaluators a starting point, and to encourage generation of diverse worlds. Evaluators can click on a location to highlight it (in Figure 7, the upper right location has been highlighted). Then, evaluators can use the different search bars to add locations, characters, and objects respectively. Once a map tile has been filled, the name of the location is labeled and the color changes according to the category of the location. For example, forests are colored green. We additionally mapped each character and object to an emoji, so the map tile would have a visual depiction to remind evaluators what they have already placed. For example, the Central Bazaar location in the bottom right of Figure 7 has a shopkeeper as a character and spices as an object.

We present results for various additional survey questions in Figure 8. As shown, evaluators with access to model suggestions self-reported that they found the locations, characters, and objects more diverse- likely as the model surfaces suggestions they may not have thought of. Further, evaluators rated liking their placed characters substantially more with access to the model than without.

The list of survey questions we asked is the following:

  • How much time, in minutes, did you spend creating this world?

  • On a scale of 1 to 5, with 5 being the best. How satisfied are you with the map you have built?

  • On a scale of 1 to 5, with 5 being the best. If you had to play a video game in this world, how satisfied would you be?

  • What did you like about the experience building this world? What would you change if you could? (freeform response)

  • Agree or Disagree: The world I have built is interesting

  • Agree or Disagree: I like the characters I put in the locations

  • Agree or Disagree: I like the objects I put in the locations

  • Agree or Disagree: I like how I linked the locations

  • Agree or Disagree: I like the diversity of locations

  • Agree or Disagree: I like the diversity of characters

  • Agree or Disagree: I like the diversity of objects

If evaluators had access to model based suggestions, they received these additional questions:

  • On a scale of 1 to 5, did you find the suggestions in the dropdown menu helpful?

  • Agree or Disagree: the suggestions made it faster for me to fill in a world

  • Agree or Disagree: the suggestions were diverse and interesting

  • Agree or Disagree: I often picked something from the suggestion

  • Agree or Disagree: I would like having the suggestions again

  • Did you like the suggestions? Why or why not? (freeform response)

  • If you could improve the suggestions, how would you do that? (freeform response)

1 Constant: N = maximum number of locations Constant:

P = filler probability =

Constant: X = block percentage Initialize an empty grid and fill the center with a randomly selected locations Set N = maximum number of locations Block X% of grid positions foreach location do
2       foreach location neighbor description do
3             Randomly choose direction for new location newLocation = PredictLocation(description) if  then
4                   newLocation = filler location
5             end if
6            Place newLocation in direction Prevent newLocation from being predicted again if not filler Randomly connect newLocation to existing surrounding locations newCharacters = PredictCharacter(description) newObjects = PredictObject(description) if num_locations  then
7                  return
8             end if
10       end foreach
12 end foreach
Algorithm 1 Creating a Playable World for LIGHT
Figure 7: User Interface for Map Creation
Figure 8: Additional Survey Question Results. Evaluators with access to model suggestions liked their placed characters more than evaluators without model suggestions, and rated the game locations, characters, and objects more diverse. Likely having the model suggestions allows evaluators to read a greater diversity of game elements.

6.4 Generated Maps

We show examples of locations arranged by our models in Figure 9 and Figure 10. Filler locations, such as unused chamber, hallway, and empty storage room are shown in white.

Figure 9: Example Generated Map of Linked Locations
Figure 10: Example Generated Map of Linked Locations

6.5 Generated Game Elements

We employ grid search to find the hyperparameters that lead to the best performance. Selected hyperparameters are listed in Table 

6. We show various additional examples of generated locations, characters, and objects from the best performing models in Table 9, Table 7, and Table 8.

Hyperparameter Descriptions Affordances
Number of Layers 6 6
Embedding Size 512 512
Dropout 0.1 0.1
Learning Rate
Table 6: Hyperparameters from models used for generation of game elements
Given Character: scientist
Generated I like to study about magic . the science
Persona: that makes magic works. I work here on
my own. I work in the lab to make sure
that other people are paying attention.
Generated He is a scientist that works in the lab on
Description: a small scale. He studies magic.
Given Character: general store owner
Generated I sell all my goods to the local king. I
Persona: am paid to serve him and i make sure
he doesn’t come in and steal from me. I
am also a leader amongst the peasants.
Generated This general store owner is always
Description: looking for more personal items to buy.
Given Character: ambassadors
Generated I am an ambassador of the region. I am
Persona: not quite sure where I’d like to be
located , but I will tell them what they
need to know about my country. I would
like to be able to speak with them , but
I don’t know which one.
Generated She is a fierce diplomat and seems to
Description: care deeply about all the people she
meets that are important to them. She is
a nice girl and has a very kind
Given Character: archaeologists
Generated I am a archaeologist. I travel to the
Persona: kingdom to gather information. I am a
historian, and I study what’s been
happening around this kingdom.
Generated The archaeologist is a hard working
Description: man and he spends most of its day
looking for more information to share.
Table 7: Example Generated characters, given the desired character name.
Given Object: scientific and writing instruments
Generated The pen is made of metal and it seems the
Description: hand-written instrument has a unique
feel in the hand.
Generated gettable
Given Object: ghosts
Generated The ghost appears to still have its memories
Description: but is a strange looking creature that
lives in another world without one.
Generated None
Given Object: mask
Generated The mask looks old and wrinkled by use.
Generated gettable
Affordances: wearable
Given Object: weapons
Generated The weapon is very sturdy, and the iron in
Description: the hilt is very light
Generated gettable
Affordances: weapon
Table 8: Example Generated Objects, given the desired object name.
Given Location: castle exit
Generated Description: The entrance to the castle is old and narrow - - it is dark and unbearably dangerous. The exit is
large, with stone walls, and several wooden doors.
Generated Background: This castle has been abandoned many centuries ago. It is impossible to even escape. The castle is not
safe for ordinary creatures.
Given Location: Underground Chamber
Generated Description: The underground chamber is a dark and dark place filled with many deadly creatures. People call it a
’cave’ but it has been abandoned by the village’s inhabitants. No human has crawled out of it.
Generated Background: This chamber contains information that people of all ages would not have gotten without their help.
The villagers have a sense of urgency and fear. When they come into the chamber this chamber contains
information that keeps a secret from humans.
Given Location: The Pit of Despair
Generated Description: The pit of despair is a dark place to look at and despair itself is a place where the demons hang out.
There’s even more evil in this corner than there was before.
Generated Background: The pit of despair is the place where all the evil humans are kept and treated as if they are dead.
They were found and tortured and the demons are all gone forever.
Given Location: School
Generated Description: The school is old and empty; only the last 10 generations had any memories of the past. The local
schools look the same, but there are a lot of young - gravy looking people in there, showing their talents
and passion.
Generated Background: The school’s reputation is mostly because it has been established as a place of peace throughout history.
There are numerous students at each high school that live here in hopes of getting a better education. The
teachers have also heard of how this school will lead to a higher life .
Table 9: Example Generated Locations, given the desired object name.