Mixed-Initiative Level Design with RL Brush

08/06/2020 ∙ by Omar Delarosa, et al. ∙ NYU college 7

This paper introduces RL Brush, a level-editing tool for tile-based games designed for mixed-initiative co-creation. The tool uses reinforcement-learning-based models to augment manual human level-design through the addition of AI-generated suggestions. Here, we apply RL Brush to designing levels for the classic puzzle game Sokoban. We put the tool online and tested it with 39 different sessions. The results show that users using the AI suggestions stay around longer and their created levels on average are more playable and more complex than without.



There are no comments yet.


page 2

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Modern games often rely on procedural content generation (PCG) to create large amounts of content autonomously or with limited or no human input. PCG methods are used with many different design goals in mind, including enabling a particular aesthetic. They can also be used to streamline time-intensive tasks such as modeling and designing thousands of unique tree assets for a forest environment. By off-loading these tasks to AI agents, game projects can potentially free up time and financial resources to other tasks that AI agents are less-well suited for. Additionally, by blending human creativity with AI co-creation to produce game content the human designer may not have even considered alone, we could also enable new creative directions (Shaker et al., 2016).

In Procedural Content Generation via Reinforcement Learning, or PCGRL (Khalifa et al., 2020), levels are first randomly generated, and then incrementally modified to become better. The generated levels are good enough that they could be used, but they are not guaranteed to be good enough that they would actually be used by the designer. The resulting levels may not always align with the human designer’s needs, and they would have to keep generating levels until they find one that is satisfactory. There is very little human involvement and minimal control over what the resulting level will be.

In order to make this level generation method more applicable for design, we leverage the incremental nature of PCGRL in building a mixed-initiative level editing tool. Thus, this paper presents RL Brush, a human-AI collaborative tool that balances user-intent and AI model suggestions. RL Brush will allow a human designer to create levels as they please while suggesting modifications to improve the level from different AI models. The human designer may choose to accept suggestions as they see fit. The tool aims to assist and empower human designers to create levels that are good, unique, and suitable to the user’s objectives.

Related Work

Procedurally generated content has been used in games since the early 1980s. Early PCG-enabled games like Rogue (Michael Toy, 1980) used PCG to expand the overall depth of the game by generating dungeons methods as well as coping with the hardware limitations of the day (Yannakakis and Togelius, 2018). This section will lay out more contemporary applications and methods of generating game content procedurally, specifically using reinforcement learning.

PCG via Reinforcement Learning

Reinforcement Learning (RL) is a type of Machine Learning technique where typically, an agent takes action in an environment at each time-step, and receives a reinforcement, interpreted as state and reward, from the environment 

(Sutton et al., 1998). PCGRL(Khalifa et al., 2020) introduces reinforcement learning to level generation by seeing the design process as a sequential task. Different types of games provide information on the design task as functions: an evaluation function that assesses the quality of the design and a function that determines whether the goal is reached. RL agents that play

the content generation task defines the state space, action space, and transition function. For typical 2D grid based games, the state can be represented as a 2D array or 2D tensor. Agents of different representation may observe and edit the map in different patterns. In the paper, three types of agents, namely

narrow, turtle and wide, can respectively edit one single tile, or move on the map in a turtle-graphics-like way, or edit the entire map.

PCGRL Agents

The three RL-based level-design agents introduced in PCGRL (Khalifa et al., 2020) (Bhaumik et al., 2019) as narrow, turtle and wide have origins in search-based approaches to level-generation, however the primary focus in the subsequent sections will be on their RL-based implementations. This section describes these three canonical agent types.


The narrow agent observes the state of the game and a location on the 2D-array grid representation of the game level. Its action space consists of a tile-change action: whether to make a change or not at location and what that change would be.


Inspired by turtle graphics languages such as Logo (Goldman et al., 2004) (Khalifa et al., 2020), turtle agent also observes the state of the grid as a 2D array and a location on that grid. Like narrow agent, one part of its action-space is defined as a tile-change action. Unlike narrow, its action space also includes a movement-action in which the agent changes the agent’s current position on the grid to by applying a 4-directional translation on its location moving it either up, down, left or right.


The wide agent also observes the state of the grid as a 2D array. However, its does not take a location parameter. Instead, its action space selects a location on the grid as the affected location and a tile-change action.

PCG via Other Machine Learning Methods

Other machine learning approaches have been taken to procedural content generation, besides RL and mostly based on supervised or unsupervised learning; the generic term for this is Procedural Content Generation via Machine Learning (PCGML) 

(Summerville et al., 2018). Mystical Tutor (Summerville and Mateas, 2016), an iteration on the Twitter bot @RoboRosewater, generates never-before-seen Magic: The Gathering

cards using an Long short-term memory (LSTM) neural network architecture. While Torrado et al.

(Torrado et al., 2019) demonstrated that Legend of Zelda

(Nintendo, 1986) levels can be generated using generative adversarial networks (GAN). However, Compared to other types of PCGML, PCGRL does not in general need any training data. RL-based approaches on reward functions, which can be manually designed and tuned or even learned. Another way PCGRL differs from wholistic, ML approaches like GAN architecture-based PCGML differ is by approaching level-generation incrementally. In each step, the agent will take an action such as moving to or selecting a certain position (for example, in 2D grid space) or changing the tile at the current position. This characteristic of PCGRL makes it well-suited for mixed-initiative design.

PCG via Mixed-initiative Level Design

In mixed-initiative design, the human and an AI system work together to produce the final content (Yannakakis et al., 2014; Zhu et al., 2018). Multiple mixed-initiative tools for game content creation have been invented over the years. Tanagra(Smith et al., 2010) is a prototype mixed-initiative tool for platformer level design in which AI can either generate the entire level or fill in the gaps left by human designers. Sentient sketchbook(Liapis et al., 2013) is a tool for designing a Starcraft-like (Blizzard, 1998) strategy game. Users can sketch in low-resolution and create an abstraction of the map in terms of player bases, resources, passable and impassable tiles. It uses feasible-infeasible two population GA (FI-2pop GA) for novelty search and generates several map suggestions as users are sketching. An example of a mixed-initiative PCG tool that generates levels for a specific game is Ropossum, which creates levels for the physics-based puzzle game Cut the Rope

, based on a combination of grammatical genetic programming and logic-constrained tree search  

(Shaker et al., 2013b, a). Another such example is the mixed-initiative design tool for the game Refraction, which teaches fractions; that tool is built around a constraint-solver which can create puzzles of specific difficulty (Butler et al., 2013).

More recently, Alvarez et al.(Alvarez et al., 2019) introduced Interactive Constrained MAP-Elites for dungeon design, which offers similar suggestion-based interaction supported by MAP-Elites algorithm and FI-2pop evolution. Guzdial et al.(Guzdial et al., 2018) proposed a framework for co-creative level design with PCGML agents. This framework uses a level editor for Super Mario Bros (Nintendo, 1985), which allows the user to draw with a palette of level components or sprites. After finishing one turn of drawing, the user clicks the button to allow the pre-trained agent to make additions sprite-by-sprite. This tool is also useful for collecting training data and for evaluating PCGML models. Machado et al. used a recommender system trained on databases of existing games to recommend game elements including sprites and rules across games  (Machado et al., 2019).


Figure 1: RL Brush screenshot of the Sokoban level editor.

This section introduces RL Brush, a mixed-initiative level-editing tool for tile-based games that uses an ensemble of trained level-design agents to offer level-editing suggestions to a human user. Figure 1 shows a screenshot of the tool 111https://rlbrush.app/. The present version of RL Brush is tailord for building levels for the classic puzzle game Sokoban (Thinking Rabbit, 1982) and generating suggestions interactively.


Sokoban, or “warehouse keeper” in Japaense, is a classic 2-D puzzle game in which the player’s goal is to push boxes to their designated locations within an enclosed space (called goals). The player can only push boxes horizontally or vertically. The number of boxes is equal to the number of designated locations. The player wins when all boxes are in the correct locations.

RL Brush

In the spirit of human-AI co-creation of tools like Evolutionary Dungeon Designer (Alvarez et al., 2018) and Sentient Sketchbook (Liapis et al., 2013), RL Brush interactively presents suggested edits in to a human level creator, 4 suggestions at a time. Instead of using search-based approaches to generate the suggestions RL Brush utilizes the reinforcement-learning-based level-design agents presented by (Khalifa et al., 2020). RL Brush builds on the work introduced by PCGRL (Khalifa et al., 2020) by combining user-interactions with the level-designing narrow-, turtle- and wide-agents and an additional majority, meta-agent into a human-in-the-loop, interactive co-creation system.

Architecture Overview

Figure 2: RL Brush System Arcihtecture.

Figure 2 shows the system architecture for our tool RL Brush. The system consists of 4 main components:

  • GridView: is responsible on rendering and modifying the current level state.

  • TileEditorView: allows the user to select tools to edit the current level viewed in the GridView.

  • SuggestionView: shows the different AI suggestions from the current level in the GridView.

  • ModelManger: updates all the suggestions viewed in SuggestionView if the current level changed in the GridView.

The user can edit the current level () either by selecting a suggestion from the or by using a tool from the and modifying directly the map. This change will emits a signal to the component with the new grid (). The ModelManager runs all the AI models and collect their results and send the results back to the . The will be described in more details in subsequent section.

Human-Driven, AI-Augmented Design

Both the TileEditorView and the SuggestionView respond only to user-interactions in order to ultimately provide the human in the loop the final say on whether to accept the AI suggestions or override them through manual edits. The goal is to provide a best-of-both-worlds approach to human and AI co-creation in which the controls of a conventional level-editor can be augmented by AI suggestions without replacing the functionality a user would have expected from a manual tile editor. Instead, the human drives the entire level design process while taking on a more collaborative role with the ensemble of AI level-design agents.

ModelManager Data Flow

Figure 3: Each suggestion in the UI is generated by a different agent whose name appears below its diff rendering. Clicking on the suggestion applies it to the grid .

The ModelManager in figure 2 handles the interactions with the PCGRL agents (where is the number of used PCGRL agents) and meta-agents (where is the number of used meta-agents). The ModelManager gets the current level state and sent to these agents where they edit it then it emits a stream of SuggestedGrid objects . The SuggestionView in turn observes the stream of G lists and uses them to generates suggestions s from G by diffing them against the current level state to generate a list of suggestions for rendering and presenting the user in the UI’s suggestion box (figure 3).

Meta-agents in m consist of agents that combine or aggregate the results of a in some way to generate their results. In RL Brush, the majority agent is an example of a meta-agent that aggregates one or more of the agents suggestions () to a new suggestion (). The majority meta-agent is powered by a pure, rule-based model that only makes a suggestion of a tile mutation if the majority of the agents have the same tile mutation in their suggestions. In our case, we are using 3 different PCGRL agents (narrow, turtle, and wide) which means at least 2 agents have to agree on the same tile mutation.

ModelManager’s Hyper-Parameters

Figure 4: These two UI elements a and b control the step and tile radius parameters respectively.

Two primary hyper-parameters exist in RL Brush for tuning the performance of ModelManager. One is the number of steps and the other is the tool radius. These are each controlled from the UI using the components in figure 4.

Figure 5: The changes in the step parameter control the number of iterations n in the loop of recursive ChangeEvent objects that feed back into the ModelManager

The step parameter controls how many times the ModelManager will call itself recursively (Fig. 5). For each step the ModelManager will call itself recursively n times on a self-generated stream of ChangeEvent (G’) objects. Having a higher step value allows agents to make more than one modification to the map. This is an important hyper parameter because most of these agents are trained to not be greedy and try to do modification that requires long term edits. Limiting these agents to only see one step ahead will suffocate them and their suggestions might not be very interesting for the users.

Figure 6: The changes in the tool radius parameter control the size of the slice of grid G’ that is visible to the agents as input.

The tool radius parameter controls how big the window of tiles are visible to the agent as input. Agents can’t provide suggestions outside of this window. It focuses the suggestion to be around the area the user is modifing at the current step. In Fig. 6

the white tiles are padded as empty or as walls, depending on the agent. The red tiles represent the integer values of each tile on the grid

G. The green tile represents the pivot tile or position on the grid G that the user last clicked on if a tile was added manually. In cases where no tile was clicked222Such as the case in which the user accepted an AI suggestion, the center of the grid G is used as the pivot tile. The radius refers to the Von-Neuman neighborhood’s radius with respect to the pivot tile. However, note that for all grids G where , the entire grid is used such as in cases of on microbans of size .


In this section we demonstrate through a user study conducted to study the interactions between users and the AI suggestions. We are primarily interested in answering the following five questions:

  • Q1: Do users prefer to use the AI suggestions or not?

  • Q2: Does the AI guide users to designing more playable levels?

  • Q3: Which AI suggestions yield higher engagement from users?

  • Q4: What is the effect of the AI suggestions on the playable levels?

Total Event Counts
Total User Sessions 75
Total Interaction Events 3165
Total Ghost Suggestions Accepted 308
Level Versions Per Session 10.6
Ghost Suggestions Accepted Per User Session 4.11
Total Interactions Per Session 42.2
Table 1: Interaction Event Summary

For the experiment, we published the RL Brush web app 333https://rlbrush.app/ to the web and captured user-interaction events to a web server. During the course of about 2 weeks, 75 user sessions were created total. Table 1 shows the counts of key metrics that we used to measure the interactions of users and the RL Brush UI. For instance, each session resulted in an average of 10.6 level versions throughout each user’s 42.2 interactions with the UI (i.e. button presses or clicks) during the course of the session. From these 10.6 level versions 4.11 were generated using the AI suggested edits or ghost suggestions.


Used AI Didn’t Use AI Total
Playable 9 2 11
UnPlayable 8 20 28
Total 17 22 39
Table 2: Statistics on the 39 full session

From these 75 user sessions, 39 sessions were full session logging from the start to the end. We analyzed these sessions on an event-by-event basis and found a few trends. Table 2 shows the statistics about all these 39 fully-logged, sessions. The amount of people that didn’t use the AI () is slightly higher than the ones used the AI (). There might be a lot of different reasons that users never engaged with the system but we suspect the absence of a formal tutorial could have impacted the results here. On the other hand, users that interacted with at least one AI suggestion yielded at more playable levels ( out of ) than users did not interact with AI suggestions at all ( out of ). This suggests that the AI suggestions nudge users toward building playable levels.

Figure 7: Users who did not use any AI suggestions seemed to take longer to create valid board states.

One such trend shows that, of those users who had at least one valid, solvable board during their session, the users that interacted with at least one AI suggestion create a valid board earlier in their session than those who did not use any AI suggestions, as illustrated in Fig. 7. Perhaps the higher learning curve of using AI suggestions makes it less immediately obvious to users than the directness of manual edits. Conversely, this could mean that having AI suggestions in the system makes users more engaged overall.

Figure 8: Users that used AI suggestions seemed to create levels that required more steps in their solutions.

Another trend can be seen in Fig.  8 where the solution length, calculated using a BFS (Breadth-First Search) solver, of each level created with assistance of AI is on average longer than levels without AI. This could also indicate that AI suggestions yield higher overall engagement and directs users toward creating more complex levels.

Figure 9: The majority agent seems to be the most popular across sessions and received the most total interactions or clicks.

Since we provided different models to pick from, we were curious to check which suggestions were most useful for the users. Fig. 9 shows a histogram about which different model has been used more often. We found out that the majority voting suggestion works far better than we expected as different agents could have different suggestion and not agree on what to do. In the Discussion section, we will discuss plans for further investigations aggregated suggestions.

Figure 10: The number of AI suggestions accepted and the overall solution complexity seem to be linearly correlated.

Finally, we calculated the correlation between the number of AI accepted suggestion in a session and the solution length of the created level. Fig. 10 shows a weak, linear correlation (with coefficient equal to ) between the number of AI suggestions used during level creation and the maximum level difficulty achieved during that session, in terms of solution length. This correlation shows that AI suggestions have an effect on the users towards creating more complex levels with longer solutions. We would love to investigate that in future work with more users.


The system described here can be seen as a proof of concept for the idea of building mixed-initiative interaction on PCG methods based on sequential decisions. Most search-based PCG methods, as well as most PCGML methods, outputs the whole level (or other type of content) as a unit. PCGRL, where the generator has learned to design one step at a time, might afford a mode of interaction more suited to how a human designs a level. It would be interesting to investigate whether tree search approaches to level generation could be harnessed similarly (Bhaumik et al., 2019).

Looking back at the results, we can say that RL Brush was able to engage more users to create complex playable levels. We also noticed that more users are engaged with meta-agent compared to other models. Comparing these results with our questions introduced in the Experiments section:

  • Q1: Based on the data we have, we can’t clearly say if the users preferred to use the system with AI or without but we are sure that whoever used was more engaged overall.

  • Q2: From the collected statistics the amount of playable levels within the users that used the AI is a lot bigger than without it.

  • Q3: The majority agent was the most engaged agent with, people interacted with it far more than all the rest. We should try to experiment with new ideas for meta-agents in future work.

  • Q4: The results indicates with very small correlation that AI system helps create more complex levels with longer solution length but more data is needed to verify that.

In addition to the results described in the previous section, a broader test of human users could further explore the quality of the levels generated beyond the scope of automated solvers and through the use of human play-testing. Additional metrics can be gathered to support this and more targeted, supervised user research can be done here.

Once the broader user studies have been conducted, additional client-side models can be added to RL Brush that learn the weights of meta-agents and continuously optimize them through online-model training. In this way, we could better leverage the ModelManager

’s ensemble architecture’s capabilities. Furthermore, the existing PCGRL models could be extended to continuously train online using reward functions incorporating parameters based on user actions. Similarly, novel client-side models specifically tailored to improve the UX (user experience) could be incorporated into future versions that better leverage the capabilities of TensorFlow.js, which

RL Brush utilizes in its code already.

Subsequent versions would also add support for additional games, level-design agent types and grids in order to increase the overall utility of RL Brush as a functional design tool.


In the previous sections we have introduced how RL Brush provides a way to seamlessly integrate human level editing with AI suggestions with an opt-in paradigm. The results of the user study suggest that using the AI suggestions in the context of level editing has an impact on the quality of the resulting levels. In general, using the AI suggestions resulted in more playable levels per session and levels of higher quality, as measured by solution length.

There is clearly more work to do in this general discussion. We don’t know yet to what types of levels and other content this method can be applied, and there are certainly other types of interaction possible with an RL-trained incremental PCG algorithm. RL Brush will hopefully serve as a nexus of discovery in the space of using PCGRL in game-level design.


We would like to thank the reviewer agents for reading the paper, and making editing suggestions so that we can improve it in a mixed-initiative manner.


  • A. Alvarez, S. Dahlskog, J. Font, J. Holmberg, and S. Johansson (2018) Assessing aesthetic criteria in the evolutionary dungeon designer. In Proceedings of the 13th International Conference on the Foundations of Digital Games, pp. 1–4. Cited by: RL Brush.
  • A. Alvarez, S. Dahlskog, J. Font, and J. Togelius (2019) Empowering quality diversity in dungeon design with interactive constrained map-elites. In 2019 IEEE Conference on Games (CoG), pp. 1–8. Cited by: PCG via Mixed-initiative Level Design.
  • D. Bhaumik, A. Khalifa, M. C. Green, and J. Togelius (2019) Tree search vs optimization approaches for map generation. arXiv preprint arXiv:1903.11678. Cited by: PCGRL Agents, Discussion.
  • E. Butler, A. M. Smith, Y. Liu, and Z. Popovic (2013) A mixed-initiative tool for designing level progressions in games. In Proceedings of the 26th annual ACM symposium on User interface software and technology, pp. 377–386. Cited by: PCG via Mixed-initiative Level Design.
  • R. Goldman, S. Schaefer, and T. Ju (2004) Turtle geometry in computer graphics and computer-aided design. Computer-Aided Design 36 (14), pp. 1471–1482. Cited by: Turtle.
  • M. Guzdial, N. Liao, and M. Riedl (2018) Co-creative level design via machine learning. arXiv preprint arXiv:1809.09420. Cited by: PCG via Mixed-initiative Level Design.
  • A. Khalifa, P. Bontrager, S. Earle, and J. Togelius (2020) PCGRL: procedural content generation via reinforcement learning. arXiv preprint arXiv:2001.09212. Cited by: Introduction, PCG via Reinforcement Learning, Turtle, PCGRL Agents, RL Brush.
  • A. Liapis, G. N. Yannakakis, and J. Togelius (2013) Sentient sketchbook: computer-assisted game level authoring. Cited by: PCG via Mixed-initiative Level Design, RL Brush.
  • T. Machado, D. Gopstein, A. Nealen, and J. Togelius (2019) Pitako-recommending game design elements in cicero. In 2019 IEEE Conference on Games (CoG), pp. 1–8. Cited by: PCG via Mixed-initiative Level Design.
  • N. Shaker, M. Shaker, and J. Togelius (2013a) Evolving playable content for cut the rope through a simulation-based approach. In

    Ninth Artificial Intelligence and Interactive Digital Entertainment Conference

    Cited by: PCG via Mixed-initiative Level Design.
  • N. Shaker, M. Shaker, and J. Togelius (2013b) Ropossum: an authoring tool for designing, optimizing and solving cut the rope levels. In Ninth Artificial Intelligence and Interactive Digital Entertainment Conference, Cited by: PCG via Mixed-initiative Level Design.
  • N. Shaker, J. Togelius, and M. J. Nelson (2016) Procedural content generation in games. Springer. Cited by: Introduction.
  • G. Smith, J. Whitehead, and M. Mateas (2010) Tanagra: a mixed-initiative level design tool. pp. 209–216. Cited by: PCG via Mixed-initiative Level Design.
  • A. J. Summerville and M. Mateas (2016) Mystical tutor: a magic: the gathering design assistant via denoising sequence-to-sequence learning. In Twelfth artificial intelligence and interactive digital entertainment conference, Cited by: PCG via Other Machine Learning Methods.
  • A. Summerville, S. Snodgrass, M. Guzdial, C. Holmgård, A. K. Hoover, A. Isaksen, A. Nealen, and J. Togelius (2018) Procedural content generation via machine learning (pcgml). IEEE Transactions on Games 10 (3), pp. 257–270. Cited by: PCG via Other Machine Learning Methods.
  • R. S. Sutton, A. G. Barto, et al. (1998) Introduction to reinforcement learning. Vol. 135, MIT press Cambridge. Cited by: PCG via Reinforcement Learning.
  • R. R. Torrado, A. Khalifa, M. C. Green, N. Justesen, S. Risi, and J. Togelius (2019) Bootstrapping conditional gans for video game level generation. External Links: 1910.01603 Cited by: PCG via Other Machine Learning Methods.
  • G. N. Yannakakis, A. Liapis, and C. Alexopoulos (2014) Mixed-initiative co-creativity. Cited by: PCG via Mixed-initiative Level Design.
  • G. N. Yannakakis and J. Togelius (2018) Artificial intelligence and games. Vol. 2, Springer. Cited by: Related Work.
  • J. Zhu, A. Liapis, S. Risi, R. Bidarra, and G. M. Youngblood (2018) Explainable ai for designers: a human-centered perspective on mixed-initiative co-creation. In 2018 IEEE Conference on Computational Intelligence and Games (CIG), pp. 1–8. Cited by: PCG via Mixed-initiative Level Design.