Mapping Hearthstone Deck Spaces through MAP-Elites with Sliding Boundaries

04/24/2019 ∙ by Matthew C. Fontaine, et al. ∙ New Jersey Institute of Technology 14

Quality diversity (QD) algorithms such as MAP-Elites have emerged as a powerful alternative to traditional single-objective optimization methods. They were initially applied to evolutionary robotics problems such as locomotion and maze navigation, but have yet to see widespread application. We argue that these algorithms are perfectly suited to the rich domain of video games, which contains many relevant problems with a multitude of successful strategies and often also multiple dimensions along which solutions can vary. This paper introduces a novel modification of the MAP-Elites algorithm called MAP-Elites with Sliding Boundaries (MESB) and applies it to the design and rebalancing of Hearthstone, a popular collectible card game chosen for its number of multidimensional behavior features relevant to particular styles of play. To avoid overpopulating cells with conflated behaviors, MESB slides the boundaries of cells based on the distribution of evolved individuals. Experiments in this paper demonstrate the performance of MESB in Hearthstone. Results suggest MESB finds diverse ways of playing the game well along the selected behavioral dimensions. Further analysis of the evolved strategies reveals common patterns that recur across behavioral dimensions and explores how MESB can help rebalance the game.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Quality diversity (QD) algorithms have recently emerged as a powerful alternative to traditional single-objective optimization methods (Pugh et al., 2016). Because of their ability to discover multiple and diverse optima in a search space, they are well-suited for domains with many types of viable solutions. In comparison to single-objective optimization methods, QD algorithms may better approximate the variety of strategies that humans develop to navigate complex real-world environments.

Despite their potential impact, QD algorithms are predominately explored in only a fraction of the possible domains that may benefit from them, including problems in traditional evolutionary robotics such as locomotion (Cully et al., 2015) and maze navigation (Pugh et al., 2016). However, games offer a potentially fruitful avenue for QD research because of the multitude of possible strategies that can result in a success or win condition, as well as the multiple dimensions along which game content can and sometimes should vary (Khalifa et al., 2018). Exploring domains beyond evolutionary robotics is critical for understanding both the strengths and limitations of this new class of algorithms.

Hearthstone (hea, [n. d.]) is a popular collectible card game that presents a variety of AI-based challenges, including developing strategies for gameplaying and deckbuilding. This paper adapts the canonical MAP-Elites (short for Multi-dimensional Archive of Phenotypic Elites) QD algorithm to generate decks in Hearthstone, where the main challenge is not to find a gameplaying or problem-solving strategy (or set of strategies), but instead to evolve a deck, which can be seen as a toolbox around which a human player can construct winning strategies. In this way, Hearthstone presents a fundamentally new kind of domain for QD algorithms, which has previously been applied to search for controllers or strategies directly. Additionally, applying a QD algorithm to games like Hearthstone presents a novel challenge because viable decks must be able to adapt to an opponent actively and antagonistically changing the environment (herein interpreted as game state).

The principal contributions of this paper are a new modification of the MAP-Elites algorithm (MESB), a new application of this modification of MAP-Elites to generating decks in Hearthstone, and several results concerning the availability of good decks in the basic and classic set of cards in Hearthstone. Called MAP-Elites with Sliding Boundaries (MESB), this new modification of MAP-Elites introduces sliding boundaries, which allows for better handling of unequal distribution of promising solutions. While there are a few earlier examples of QD (Khalifa et al., 2018) and constrained search (Gravina et al., 2016) algorithms for content generation in games, here we not only demonstrate the viability of MAP-Elites (and QD algorithms in general) for generative deck design, but we also show how analyzing sets of diverse evolved decks can provide insights into the dynamics of the game and concrete suggestions for how to rebalance a real and widely-played collectible card game.

The following section provides a brief overview of both the computational design and analysis of collectible card games and quality diversity algorithms. Our approach, including the novel MAP-Elites modification MESB and the Hearthstone simulator is then presented. Next comes a series of experiments where we generate sets of decks under various conditions, and analyze these sets by mining frequent patterns to investigate dominant cards that recur in evolved decks that recur despite the diversity maintenance mechanism inherent in MAP-Elites. The results of a rebalancing experiment wherein the insights gathered from the analysis of evolved decks are used to change the dynamics of the game. Finally, a discussion of the results and our future work is described.

2. Background

This section first describes deckbuilding in Hearthstone, then reviews automated approaches to deckbuilding and playtesting before describing quality diversity and the MAP-Elites algorithm.

2.1. Hearthstone

Published by Blizzard, Hearthstone is an adversarial, online collectible card game, where two players take turns placing cards on the digital board shared between them. When a Hearthstone game begins, each player has 30 health points and it ends when one player’s health is reduced to zero. Players are represented by one of nine possible hero classes, each with a unique hero ability that can be played at most once per turn and a subset of cards only playable by the class. While there are over 1900 possible cards to collect, at the start of each turn players draw one card to add to their hand from a subset of 30 that they have pre-selected for their decks. Deckbuilding and deciding when and how to play cards in the deck are two distinct strategy challenges. This paper explores strategies for building decks, leaving the gameplay strategies constant for each player.

While Blizzard has so far released thirteen sets of cards, when initially introduced in 2014 there were two: basic and classic, which together contain 171 playable cards for each class. Cards can be added to a deck at most twice, except for special legendary cards that can be included only once. Because there are approximately decks that can be composed for a given class with these two sets of cards, finding quality decks within this space is a significant challenge. While future experiments will explore deckbuilding with cards from different sets, experiments in this paper focus on the initial two.

Figure 1. Card Nerf. Because this card was too powerful, Nourish was nerfed to balance the game in December 2018 from costing five mana to its current cost of six.

Because of the complexity of deckspace, it is difficult to balance. However, balance is necessary to ensure players can win with a variety of decks, heroes, and gameplay strategies. Blizzard regularly edits the properties of popular cards to increase (i.e. buff) or decrease (nerf) their power and popularity in the Hearthstone community. Figure 1) shows an example nerf to the card Nourish, which was nerfed by increasing the mana cost. It is difficult to determine a priori which nerfs and buffs will balance the game, so Blizzard regularly relies on player data to make these determinations.

Because of the number of possible deckbuilding and gameplay strategies, it is important that computational tools for balancing the game are capable of simulating a diversity of strategies that reflect the variety of real-world playstyles. For example, one popular gameplay strategy is an aggressive aggro approach where players focus on dealing damage to the opponent as quickly as possible. However, a control strategy instead focuses on maintaining board control and dealing damage to the opponent only once control is established. There are other popular strategies like ramp, midrange, one turn kill, combo, and fatigue, but experiments in this paper consider only the first two as they are the most straightforward to implement and therefore more likely to simulate human play. Regardless, to play well it is important that a deckbuilding strategy complements the gameplay strategy.

2.2. SabberStone Simulator


is a Hearthstone simulator that implements the rules of the game and acts directly on the card definitions provided by Blizzard. The simulation includes an AI player that implements a turn local strategy. Action sequences are randomly generated and then evaluated with a basic game tree search algorithm where game states are hashed such that each game state is evaluated at most once. When an action sequence reaches the end of a turn, the game state is evaluated by a heuristic for playing the game well. Different heuristics can mimic specific play styles, like the aggro and control heuristics previously described.

2.3. Quality Diversity and MAP-Elites

Quality diversity (QD) algorithms, sometimes referred to as illumination algorithms, are inspired by the ability of evolution in nature to discover many niches and many different viable strategies for survival and reproduction (Pugh et al., 2016)

. This paradigm differs from traditional evolutionary computation, which optimizes towards a single objective, but also from multiobjective evolutionary optimization, where the trade-offs between multiple objectives are explored.

Novelty search (Lehman and Stanley, 2008), which replaces the standard fitness gradient with a reward for finding individuals that are simply different from anything found previously, is an important predecessor of modern QD algorithms. This strategy is beneficial because finding high-performing regions of so-called deceptive search spaces may require traversing intermediate “stepping stones” found in low-fitness regions. Since the advent of novelty search, the field of evolutionary computation has seen a focused interest in algorithms exploring beyond the fitness-only approach, for instance by combining novelty with multi-objective search (Mouret, 2011). Note that typical multi-objective optimization algorithms do not necessarily have this stepping stone collection property. At the time of this writing, the most popular QD algorithms are MAP-Elites (Cully et al., 2015) and novelty search with local competition (often abbreviated as NSLC)(Lehman and Stanley, 2011). These algorithms differ from vanilla novelty search because they each incorporate a performance-based quality measure combined with an incentive for novelty or diversity. MAP-Elites in particular has been popular recently due to its relative simplicity and strong performance on evolutionary robotics domains such as many-legged locomotion. Inspired by the idea of collecting a behavioral repertoire (Cully and Mouret, 2013, 2016), MAP-Elites imposes a discretized grid over a continuous behavior space and then collects the highest-performing individuals within each grid cell. In this way, the algorithm maintains a diverse set of phenotypes from which to generate new populations. The algorithm’s ability to thereby find many solutions to a given problem makes it particularly applicable to Hearthstone.

2.4. Automated Deckbuilding and Playtesting

As far as we know, no previous work has applied QD algorithms to deckbuilding and playtesting. This section therefore reviews non-QD approaches (including some non-evolutionary methods) to these problems.

Evolutionary methods can help explore the design spaces of games by finding hidden aspects of these spaces, potentially revealing novel insights. For example, Togelius and Schmidhuber (2008); Browne and Maire (2010) evolve new games predicted to be interesting for human players. Yavalath is an example of such a game that was commercially published and currently has a ranking of 7.2 on the popular website for ranking board games called BoardGameGeek 222 Other approaches explore the design and strategy spaces of particular games like creating variants of Flappy Bird by tuning game parameters (Isaksen et al., 2015). de Mesentier Silva et al. (2016) search the space of gameplay strategies for Blackjack and Poker (de Mesentier Silva et al., 2018b, c). While the discovered heuristics were simple and intended to instruct novice players, they were often comparable or more successful to those describing more complex strategies.

Games can be fundamentally rebalanced with even small changes to the rules (Isaksen et al., 2016), mechanics representation (de Mesentier Silva et al., 2018a) or game content (Kowalski et al., 2018). However, Hom and Marks (2007) explore automated approaches to balancing through changing game rules until agents are able to play against each other with relatively equal winrates and few draws. Through his Machinations framework, Dormans (2011) helps designers make small rule changes early on to ensure balance throughout the design process. Alternatively, Jaffe et al. (2012) present a framework to evaluate balance through comparing standard gameplaying agents to those with restricted freedom. Results indicate that such a process can help balance an educational card game. Like the previous approach, Preuss et al. (2018) balance games through integrating human and automated testing. While these types of changes may be necessary for game balance, they often have a large impact on the gameplay.

There are several approaches to evolutionary deckbuilding that this paper builds upon. Volz et al. (2016) explore evolutionary deckbuilding for the game Top Trumps that optimized different objectives with a goal of expressing fairness in the game. However, in Top Trumps all of the cards in a given pack are distributed to all of the players, whereas Hearthstone players build their own decks individually and with intention. Mahlmann et al. (2012) also searches for balanced card sets in Dominion through automated agents with different skill levels and evolutionary parameters. While in Hearthstone players must build their decks before playing, decks in Dominion are built through play. While AI-based approaches to playtesting have proven effective, the Hearthstone domain poses some unique challenges. .

Hearthstone is a particularly challenging game because of the amount of information hidden from the player, stochasticity, and high branching factor. As a result, there are many approaches to creating AI agents to play Hearthstone (Świechowski et al., 2018; da Silva and Goes, 2018; Zhang and Buro, 2017; Santos et al., 2017; Stiegler et al., 2017; Grad, 2017; Dockhorn et al., 2018). Furthermore, there have been significant advancements in win predictions based on game state evaluation (Janusz et al., 2017; Janusz et al., 2018; Jakubik, 2017). Bursztein (2016) took a unique approach by building a predictor to determine the next card that the opponent is likely to play. Modeling gameplay strategies in Hearthstone is an open problem.

Hearthstone deckbuilding has been explored with techniques other than evolution. Chen et al. presented a deck recommendation system that makes suggestions to improve the performance on the current match-up (Chen et al., 2018)

. Stiegler et al. introduced a utility metric to classify cards in relation to a deck being built 

(Stiegler et al., 2016). The method adds the cards with highest utility to the deck and then proceeds to recalculate the utility of the remaining card pool. All methods output one deck as a result. Zook et al. previously evaluated how design choices impact gameplay using a simplified version of Hearthstone for case studies (Zook et al., 2015; Zook and Riedl, 2018). Jin proposed a method for measuring card balance and consequently deck strength (Jin, 2018), while Janusz et al. investigated card similarity based on their text embedding (Janusz and Slezak, 2018).

Previous work on evolutionary Hearthstone deckbuilding in particular employed non-QD evolutionary algorithms to generate better starter decks by evaluating decks against a single AI opponent 

(Bhatt et al., 2018). García-Sánchez et al. for Hearthstone (García-Sánchez et al., 2016, 2018) similarly evaluated evolved decks against a suite of opponents. Similar approaches proved successful for the deckbuilding game Magic: The Gathering (Bjørke and Fludal, 2017).

3. Approach

This section discusses the parameters of MESB for deckbuilding in this paper and what distinguishes the modification from the traditional MAP-Elites algorithm.

3.1. Mutation and Fitness

Mutation is performed by replacing cards randomly from a pool of basic and classic cards that result in valid decks. The value

varies geometrically where the probability of exchanging

cards is given by and . Geometric mutation was chosen to satisfy the maximum entropy principle. All decks can be reached using mutations within mutations (i.e. by swapping each card in the source deck with a differing card in the destination deck). Fitness is the sum of differences in hero health over 200 games, where a positive health difference results from victory and a negative health difference results from defeat. The health difference is used as fitness rather than the win rate so that the magnitude of victory or defeat is preserved.

3.2. MAP-Elites with Sliding Boundaries

When considering behavior vectors, typical implementations of MAP-Elites uniformly set the boundary lines between cells, implying a bounded behavior space. Additionally, if the distribution of the behavior space is not uniform, the space illuminated by MAP-Elites will not accurately match the true distribution of the feature space. Lastly, it can be difficult to know the distribution of individuals along a feature vector

a priori. To solve these issues, the novel sliding boundaries modification is introduced. Instead of placing boundaries uniformly by feature value, the boundaries are placed at uniformly at percentage marks of the distribution (see figure 2). To set these boundaries, a population of the last individuals is maintained in a queue data structure. A remap frequency is also specified. Every

individuals, the boundary lines for the map are recalculated. They are recalculated by sorting the individuals along each feature and finding each individual at the corresponding percentage mark. Maintaining a sampling of the behavior space enables the estimation of the true distribution of the search space. Using binary search, queries for the proper cell can be executed in

, where is the number of boundaries and is the number of dimensions in the map. As remapping only happens periodically, the algorithm maintains good empirical performance. For all experiments , meaning all discovered individuals are used to draw boundary lines, and , meaning the map boundaries are recalculated every individuals.

Figure 2. Drawing Boundaries in the Behavior Space. Unlike original versions of Map-Elites that draw uniform boundary lines regardless of the population density (left), MESB draws boundaries based on the number of individuals currently occupying regions of the behavior space (right).

3.3. MAP-Elites with Resolution Expansion

MAP-Elites implementations can vary in certain parameters. Some implementations use a fixed number of boundaries along a given feature. Other implementations scale the number of boundaries over time. However, MESB scales linearly where the initial size of the map is and incrementally scales the map to at uniform time intervals (i.e. ). Preliminary results showed a better performance for this scaling method than a fixed resolution of the archive.

3.4. Behavior Vectors

Knowing how the sampled distribution found by MAP-Elites varies from the complete distribution of all possible decks in the deckspace can help evaluate the sliding boundaries modification in MESB. Because it can be difficult to enumerate the distribution of all individuals in the deckspace for a set of behaviors, MESB distributes elites based on their fitness and genetic properties of the deck as behavior vectors. Such properties are statistical information that can be calculated before evaluation, making it possible to measure complete sets of possible distributions.

In MAP-Elites (and other QD algorithms), evolved individuals are usually characterized by behaviors observed during evaluation to maintain diversity. However, meaningful diversity can be characterized by information known a priori because gameplaying strategies in Hearthstone differ significantly based on the mana cost of cards in a deck. Such distributions of card costs are often called mana curves

, with a cost between zero and ten on the x-axis and frequency between zero and thirty on the y-axis. They are important because they often characterize the type of gameplay strategies that can be successful. For instance, aggro strategies that play many cards early in the game often have few if any cards that cost over five mana. As an approximation of this mana curve, the behavior vector for each deck contains 1) the average mana cost of all 30 cards in the deck and 2) the variance of mana costs.

In addition to being able to calculate the true distribution of average mana and mana variance, these behaviors vectors are associated with different style of play. For instance, cards with a low mana cost are typically played early in the game, while cards with a high mana cost are played later. The average mana of the deck is a rough measure on which stage of the game the deck specializes in. When deckbuilding, players must balance the mana cost of their deck focusing on more than just the early or last stages of the game. Including variance of the mana distribution measures helps measure how much the deck focused on one area of the game versus spreading out to try other mana distributions.

4. Experiments

Experiments were run on a high performance cluster with 500 CPU nodes running in parallel. (A parallel version of MAP-Elites was implemented to take advantage of the parallel nodes.) The code used to run these experiments is available on GitHub as a platform called EvoSabber333 Three sets of experiments were performed:

  1. In the MESB validation experiment opponents play decks called starter decks, which are constructed with basic cards and available to any player. The goal is to explore whether MESB can generate a high variety of decks across different mana distributions, and whether these decks reflect mana curves appropriate for their archetype.

  2. The elite adversaries experiment then evolves new decks against the best decks found in the map of elites from the first experiment (in other words, with the decks evolved in the MESB validation experiment as adversaries). This experiment explores the ability to evolve effective counters to known strong decks.

  3. For the deck balancing experiment, we first perform an Apriori analysis (Agrawal and Srikant, 1994) on all elites in the decks of the different archetypes to identify the most commonly occurring combinations of cards, and we then explore the space of decks after altering these cards to intentionally affect balancing. In part this experiment is designed to explore whether MESB is a suitable potential tool for game designers.

In each of these three experiments, three different deck configurations (hunter, paladin, and warlock) were paired both with aggro and control strategies. The hunter, paladin, and warlock configurations were selected in part for their reputations of supporting both aggro and control play styles well. Each deck configuration strategy combination was evolved by MESB for 10,000 evaluations (with 200 simulated played games per evaluation) per experiment.

The goal in each experiment is to evolve a set of high-performing set of decks that vary in terms of mana curve (which serves as a proxy for strategy type). As a reminder, fitness for the purpose of MESB elite selection in these experiments is quantified as the sum of differences in hero health over 200 games. However, winrate (the percentage of games won, regardless of health difference at end of game) is chosen to illustrate performance in these experiments because winrate determines player rankings in real-world Hearthstone games. The two-dimensional behavior space for MAP-Elites in these experiments has average mana cost of all cards in a deck on one axis and variance of these values as the other axis. Such a behavior space not only approximates a diverse strategy space but also affords easy performance visualization. Understanding how well MESB covers the space of mana curves is a good indicator of the algorithm’s potential for generating nontrivially distinct decks.

5. Results

5.1. MESB Validation

(a) Control Hunter
(b) Control Paladin
(c) Control Warlock
(d) Aggro Hunter
(e) Aggro Paladin
(f) Aggro Warlock
Figure 3. Distributions of Deck Performances Evolved Against Starter Decks in the MESB Validation Experiment. Darker hues indicate positive performance (quantified by fitness). Scatterplots for different deck configurations and strategies exhibit different shapes. While the x and y axes are standard across each of the scatter plots, to better visualize patterns in archetypes hue refers to relative fitnesses described in the corresponding legends. Both the paladin and warlock aggro archetypes show stronger hero-relative decks where average mana cost is low. However, strong hero-relative control decks exist across a range of average mana costs. Interestingly, strong hero-relative decks for both aggro and control hunters exist when average mana cost is low, potentially indicating the that this archetype is nonviable.

To give an intuition for MESB’s performance with respect to multiple play styles, Figure 3 shows the distribution of elite decks from the initial MESB validation experiment plotted in the space of mana curve space. Circles in figure 3 represent decks in the map of elites, while hue indicates fitness (summed difference in health at the end of each simulated game). Darker hues correspond to higher performance. The (x,y) coordinates in these plots represent position in approximated mana curve space. For example, in figure 3a, control hunter decks with low average mana cost (toward the left of image) trend toward higher fitness than those with high cost (toward the right). Decks with lower variance and higher average mana cost (lower right) trend toward lower fitness, indicating that decks with only higher cost cards lose before they can play these powerful cards.

Like aggro decks played by humans, trends in figures 3d, 3e, and 3f show higher performance with low mana cost. If the decks have high mana cost, they can perform well when variance is high, indicating there are sufficiently many low-cost cards to be played early in the game. While high variance in cost can mitigate some impact of having a high average mana cost, in all three maps of elites played with the aggro strategy, high mana cost and low variance significantly impacts the ability of the player to execute a successful aggro strategy. Too many high cost cards leaves the player unable to defend or attack in early turns of the game.

In figures 3b and 3c, MESB discovers high performing decks for the control strategy across a spectrum of mana curves. Interestingly, the plot for the control hunter in figure 3a is similar to the plot of the aggro hunter in figure 3d. While the plot may indicate that hunter decks are naturally suited to aggro strategies, the aggro hunter loses to all of the other decks and gameplay strategies when the decks are compared (i.e. when they are played against each other 1000 times). In fact all of the control strategies win against their corresponding aggro strategies. While these plots are only shown for the three classes and two strategies in this experiment, class appears to impact the shape of these plots and is further discussed in section 6.

To ensure that MESB effectively explored the mana curve space, figure 4 compares the mana curves discovered by MAP-Elites to the distribution of mana curves across the full deckspace. While the two distributions have different shape, the individuals observed by MESB span the majority of the mana curve space.

(a) Distribution of Decks in MESB Population
(b) Distribution of All Possible Decks in the Behavior Space
Figure 4. Density Distributions of Deck Populations. Darker cells denote a higher density of decks. In (a) all 10,000 generated decks are plotted in behavior space. This map represents the control paladin decks in the elite adversaries experiment. The distribution of all possible decks is calculated and illustrated in the behavior space in (b). While MESB does oversample some regions of decks in the mana curve space, it is possible that those areas are where the highest performing decks are located. The ranges of average mana costs and variances in (a) matches those in the true distribution in (b).

5.2. Performance of the Map of Elites

Figure 5. Best and Average Winrates for MESB Validation, Elite Adversaries, and Deckbalancing Experiments. The aggro hunter strategy generally performs worse than other deck configuration strategy combinations in all experiments. In general, decks evolved in preliminary MESB validation experiments win more games than decks evolved against elite adversaries and decks with nerfed and buffed cards (weakened and strengthened, respectively, to balance the game).

For each experiment, figure 5 shows the best and average winrates of the elite decks evolved with control and aggro strategies. Best and average winrates increase over time in all experiments. The aggro hunter decks in general perform significantly worse than other deck configuration strategy combinations.

Figure 5 shows that the best and average winrates in the MESB validation experiment were higher than those in the elite adversaries experiment. This result is expected given that opponents for evolved decks in the initial MESB validation experiment were less powerful than those faced by the evolved decks in the successive adversarial elites experiment, though it is important to note that lower winrate (against more challenging opponents) does not necessarily imply lower performance in general. In fact, when decks were compared against each other, the adversarial elites performed better than elites from their corresponding MESB validation. Generally, evolution should more easily identify high performing decks that evolved against weaker opponents because there are likely more combinations that perform well. Winrates for all deck configuration strategy combinations in the elite adversaries experiment are lower than those in the MESB validation experiment, potentially suggesting that MESB initially found decks that won relatively consistently when evolving against weaker enemies.

5.3. Measuring the Effect of Balance Changes

Figure 6. Changes in Card Frequencies after Nerfs and Buffs (Deckbalancing Experiment). Cards with high frequency are nerfed to weaken them and encourage exploration, whereas cards with low frequency may be buffed to incentivize inclusion in decks. An indicates occurrence in 25% or fewer decks.

Results from the elite adversaries experiment suggest that even though high performing decks are distributed over a range of mana curves, decks often find specific cards critical to their performance. For example, as illustrated in figure 6, Sunwalker is present in 99% percent of the control paladin decks, Earthen Ring Farseer is in 96.5% percent of aggro hunter decks, and Explosive Shot in 70% of control hunter decks. While it is possible that these cards push search toward local optima, it is also possible that they are core elements of successful deck and strategy combinations. For the deckbalancing experiment, the value of one card shared by the majority of individuals in each archetype (e.g. control paladin, aggro warlock, etc.) is decreased to explore whether MAP-Elites can help evaluate the impact of card balancing on evolved decks. The mana cost of the Earthen Ring Farseer, Explosive Shot, Argent Squire, and Young Priestess are increased by two, the Sunwalker’s attack decreased by three, and the Master Swordsmith’s health decreased by two.

With the exception of two cards (Explosive Shot and Master Swordsmith), reducing the power of the remaining four cards (i.e. nerfing them) also reduces their presence in the map of elites shown in table 6. Three of the cards are only found in 25% or less of decks in the map of elites for each archetypes. The fourth (the Sunwalker card for control paladin) is reduced to 28.8%. Interestingly, increasing the attack power by two increases the presence of the Silverback Patriarch above 25% for five of the six deck configurations. Before reducing the power of cards, most of the cards were present in only one or two different class and gameplay strategy archetypes.

Similarly, nerfing Explosive Shot and Master Swordsmith increased their occurrence in aggro and control hunter decks and control hunter and aggro paladin decks, respectively. Explosive Shot is a spell card that costs five mana to play, and does five damage to a minion and two damage to its immediate neighbors. The nerf for Explosive Shot is an increase mana cost of two. While decks playing this card may benefit from a slightly different mana distribution due to less competition in a MAP-Elites cell, it is also possible that the additional cost forces the greedy, short sighted strategies to play the card later in the game when there are potentially more cards on the board and advantage to gain. This deckbalancing experiment demonstrates how MAP-Elites can potentially uncover complex relationships in gameplay (e.g. that it is a benefit to the card holder to force a card to be played later in the game).

The occurrence of the Silverback Patriarch varies between class and strategy archetypes. While originally it is only present in aggro hunter deck, after the buff it is present in all but the aggro hunter decks. Because the card costs the same amount of mana in both the experiments, it is unclear what caused an aggro deck to abandon a taunt card with higher attack power. Perhaps more runs of map elites are necessary to make claims about individual cards. However, MAP-Elites still suggests interesting trends about the Silverback Patriarch in other decks. For example with the control paladin, the buffed Silverback Patriarch is popular in cluster of decks with higher variance and lower cost. Among the same set of post-nerf control paladin decks, the Sunwalker’s reduction in attack points pushed it in a seemingly unrelated direction to decks with a higher average mana cost. Such effects can be difficult to predict or even detect in complex systems and their discovery here indicates a preliminary effectiveness of the MAP-Elites algorithm for game testing, balance, and design.

As a result of rebalancing the cards (i.e. the nerfs and buffs) and rerunning MAP-Elites, successful aggro paladin decks included at least one Flesheating Ghoul card in 100% of decks. One hypothesis is that by increasing the cost of several cards a once, fewer of the low-cost minions that aggro paladins rely on were available for deckbuilding. Flesheating Ghoul likely filled the vacuum left by the more expensive nerfed Young Priestess. Identifying how card rebalances affect the performance of specific decks (such as those made by the Hearthstone community) is an area for future work.

6. Discussion

Results from all of the experiments illustrate that it is possible to generate high-performing decks across a range of mana distributions for a variety of playstyles. While adding only one extra card set (classic cards) than the classic optimization approach described by Bhatt et al. (2018), each additional set can introduce new balances and imbalances that could make it difficult to evolve a variety of decks. Transitivity is a property that designers need to balance well to ensure good game play, and often cards are changed after their release to encourage such variety. However, that MESB can successfully find this range of strategies corroborates the usefulness of the algorithm as a tool for deck space analysis.

By examining the distribution of high-performing decks on the map of elites, it appears that certain gameplay strategies require particular mana distributions. For example, for each of the three classes playing an aggro strategy, the highest performing decks had low mana average. While intuitive, it is conceivable and likely that some high-cost cards could benefit an aggro playstyle. Interestingly, the correlation between low mana and good aggro decks is stronger for hunter and warlock than paladin. Again, Map-Elites helps provide a non-trivial insight into the design of the game and its cards.

The balance change experiments demonstrate how MESB can support game design. Potentially overpowered cards and card combinations were identfied with an Apriori analysis of evolved decks; the diversity assured by MESB implies that if a pattern is discovered in multiple decks, it is almost certainly powerful in a variety of settings (perhaps too powerful). After nerfing cards from these sets, new runs explored the impact on the distributions of cards. With the exception of the Silverback Patriarch that was present in more high performing decks after a nerf, most nerfed cards were included in fewer high performing decks. Likely this increase is an artifact of characterizing the space of decks by mana curve. While the higher-cost version of this card is not inherently better than the lower-cost version, MESB its higher cost may place it in a less competitive position in behavioral space.

The main critique that could be leveled against the methodology employed in this paper is that it is dependent on a particular game-playing agent, namely the agent that comes with Sabberstone (the Hearthstone simulator used in these experiments). While it is unavoidable that any agent has a particular playstyle and will bias toward certain deck builds, the advantage of playing games with the Sabberstone agent is that it is well-tested and reasonably fast. Future work will explore how bias in these agents when compared to human playstyles. However, the same agaent plays games in all of the experiments; its parameterization between aggro and control styles is varied to mitigate playstyle bias.

Future will explore MESB and other modifications to MAP-Elites to investigate different problems in the space of Hearthstone decks. One question is whether different behavior vectors produce higher performing decks when played with a variety of agent strategies (beyond the aggro and control strategies in this paper). Such experiments could facilitate understanding of agent playstyles and preferences, or the formulation of more interesting behavior vectors to represent the decks. One particularly interesting avenue would be to attempt targeted rebalancing, to intentionally cultivate a metagame that favors a specific class or strategy. This methodology could help validate that balance changes affect the spaces in meaningful ways. Alternatively, it would also be interesting to observe the impact of new cards on the metagame, or even the evolution of new cards so as to occupy an unclaimed part of deck space.

7. Conclusion

This paper explored deckbuilding in the game Hearthstone through a novel modification of the MAP-Elites algorithm that introduced sliding grid cell boundaries (MESB). A series of experiments revealed that MESB is able to discover high-performing decks in a variety of strategy spaces in addition to revealing potentially novel gameplay relationships. Not only do these results offer practical implications for improving playability of a massively popular real-world game, they also expand the reach of quality diversity algorithms beyond evolutionary robotics and into a new type of domain that can inform our theoretical understanding of this promising new class of algorithms.


  • (1)
  • hea ([n. d.]) [n. d.]. Hearthstone. ([n. d.]). Accessed: 2018-03-30.
  • Agrawal and Srikant (1994) R. Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB).
  • Bhatt et al. (2018) Aditya Bhatt, Scott Lee, Fernando de Mesentier Silva, Connor W Watson, Julian Togelius, and Amy K Hoover. 2018. Exploring the hearthstone deck space. In Proceedings of the 13th International Conference on the Foundations of Digital Games. ACM, 18.
  • Bjørke and Fludal (2017) Sverre Johann Bjørke and Knut Aron Fludal. 2017.

    Deckbuilding in Magic: The Gathering Using a Genetic Algorithm

    Master’s thesis. Norwegian University of Science and Technology (NTNU).
  • Browne and Maire (2010) Cameron Browne and Frederic Maire. 2010. Evolutionary game design. IEEE Transactions on Computational Intelligence and AI in Games 2, 1 (2010), 1–16.
  • Bursztein (2016) Elie Bursztein. 2016. I am a legend: Hacking hearthstone using statistical learning methods. In Computational Intelligence and Games (CIG), 2016 IEEE Conference on. IEEE, 1–8.
  • Chen et al. (2018) Zhengxing Chen, Christopher Amato, Truong-Huy D Nguyen, Seth Cooper, Yizhou Sun, and Magy Seif El-Nasr. 2018. Q-deckrec: A fast deck recommendation system for collectible card games. In 2018 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, 1–8.
  • Cully et al. (2015) Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. 2015. Robots that can adapt like animals. Nature 521, 7553 (2015), 503.
  • Cully and Mouret (2013) Antoine Cully and Jean-Baptiste Mouret. 2013. Behavioral Repertoire Learning in Robotics. In Proceedings of the 15th annual conference on Genetic and evolutionary computation (GECCO ‘13). 175–182.
  • Cully and Mouret (2016) Antoine Cully and Jean-Baptiste Mouret. 2016. Evolving a Behavioral Repertoire for a Walking Robot. Evolutionary Computation 24 (2016), 59–88. Issue 1.
  • da Silva and Goes (2018) Alysson Ribeiro da Silva and Luis Fabricio Wanderley Goes. 2018.

    HearthBot: An Autonomous Agent Based on Fuzzy ART Adaptive Neural Networks for the Digital Collectible Card GameHearthStone.

    IEEE Transactions on Games 10, 2 (2018), 170–181.
  • de Mesentier Silva et al. (2016) Fernando de Mesentier Silva, Aaron Isaksen, Julian Togelius, and Andy Nealen. 2016. Generating heuristics for novice players. In Computational Intelligence and Games (CIG), 2016 IEEE Conference on. IEEE, 1–8.
  • de Mesentier Silva et al. (2018a) Fernando de Mesentier Silva, Christoph Salge, Aaron Isaksen, Julian Togelius, and Andy Nealen. 2018a. Drawing without replacement as a game mechanic. In Proceedings of the 13th International Conference on the Foundations of Digital Games. ACM, 57.
  • de Mesentier Silva et al. (2018b) Fernando de Mesentier Silva, Julian Togelius, Frank Lantz, and Andy Nealen. 2018b. Generating Beginner Heuristics for Simple Texas Hold’em. (2018).
  • de Mesentier Silva et al. (2018c) Fernando de Mesentier Silva, Julian Togelius, Frank Lantz, and Andy Nealen. 2018c. Generating novice heuristics for post-flop poker. In 2018 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, 1–8.
  • Dockhorn et al. (2018) Alexander Dockhorn, Max Frick, Ünal Akkaya, and Rudolf Kruse. 2018. Predicting Opponent Moves for Improving Hearthstone AI. In International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. Springer, 621–632.
  • Dormans (2011) Joris Dormans. 2011. Simulating Mechanics to Study Emergence in Games. Artificial Intelligence in the Game Design Process 2, 6.2 (2011), 5–2.
  • García-Sánchez et al. (2018) Pablo García-Sánchez, Alberto Tonda, Antonio M Mora, Giovanni Squillero, and Juan Julián Merelo. 2018. Automated playtesting in collectible card games using evolutionary algorithms: A case study in hearthstone. Knowledge-Based Systems 153 (2018), 133–146.
  • García-Sánchez et al. (2016) Pablo García-Sánchez, Alberto Tonda, Giovanni Squillero, Antonio Mora, and Juan J Merelo. 2016. Evolutionary Deckbuilding in Hearthstone. In Computational Intelligence and Games (CIG), 2016 IEEE Conference on. IEEE, 1–8.
  • Grad (2017) Łukasz Grad. 2017. Helping AI to play hearthstone using neural networks. In 2017 federated conference on computer science and information systems (FedCSIS). IEEE, 131–134.
  • Gravina et al. (2016) D. Gravina, A. Liapis, and G. N. Yannakakis. 2016. Constrained surprise search for content generation. In 2016 IEEE Conference on Computational Intelligence and Games (CIG).
  • Hom and Marks (2007) Vincent Hom and Joe Marks. 2007. Automatic Design of Balanced Board Games. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE). 25–30.
  • Isaksen et al. (2015) Aaron Isaksen, Dan Gopstein, Julian Togelius, and Andy Nealen. 2015. Discovering Unique Game Variants. In Computational Creativity and Games Workshop at the 2015 International Conference on Computational Creativity.
  • Isaksen et al. (2016) Aaron Isaksen, Christoffer Holmgård, Julian Togelius, and Andy Nealen. 2016. Characterising Score Distributions in Dice Games. Game and Puzzle Design 2, 1 (2016).
  • Jaffe et al. (2012) Alexander Jaffe, Alex Miller, Erik Andersen, Yun-En Liu, Anna Karlin, and Zoran Popovic. 2012. Evaluating Competitive Game Balance with Restricted Play. In Proceedings of the Eighth Artificial Intelligence and Interactive Digital Entertainment International Conference (AIIDE 2012).
  • Jakubik (2017) Jan Jakubik. 2017.

    Evaluation of Hearthstone Game States with Neural Networks and Sparse Autoencoding. In

    Computer Science and Information Systems (FedCSIS), 2017 Federated Conference on. IEEE, 135–138.
  • Janusz and Slezak (2018) Andrzej Janusz and Dominik Slezak. 2018. Investigating Similarity between Hearthstone Cards: Text Embeddings and Interchangeability Approaches. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 3421–3426.
  • Janusz et al. (2017) Andrzej Janusz, Tomasz Tajmajer, and Maciej Świechowski. 2017. Helping AI to Play Hearthstone: AAIA’17 Data Mining Challenge. In Computer Science and Information Systems (FedCSIS), 2017 Federated Conference on. IEEE, 121–125.
  • Janusz et al. (2018) Andrzej Janusz, Tomasz Tajmajer, Maciej Świechowski, Łukasz Grad, Jacek Puczniewski, and Dominik Ślezak. 2018. Toward an Intelligent HS Deck Advisor: Lessons Learned from AAIA’18 Data Mining Competition. In 2018 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 189–192.
  • Jin (2018) Yuanzhe Jin. 2018. Proposed Balance Model for Card Deck Measurement in Hearthstone. The Computer Games Journal (2018), 1–16.
  • Khalifa et al. (2018) Ahmed Khalifa, Scott Lee, Andy Nealen, and Julian Togelius. 2018. Talakat: Bullet Hell Generation through Constrained Map-Elites. arXiv preprint arXiv:1806.04718 (2018).
  • Kowalski et al. (2018) Jakub Kowalski, Radosław Miernik, Piotr Pytlik, Maciej Pawlikowski, Krzysztof Piecuch, and Jakub Sekowski. 2018. Strategic Features and Terrain Generation for Balanced Heroes of Might and Magic III Maps. In 2018 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, 1–8.
  • Lehman and Stanley (2008) Joel Lehman and Kenneth O. Stanley. 2008. Exploiting open-endedness to solve problems through the search for novelty. In Proceedings of the Eleventh International Conference on Artificial Life (Alife XI). 329–336.
  • Lehman and Stanley (2011) Joel Lehman and Kenneth O. Stanley. 2011. Evolving a diversity of virtual creatures through novelty search and local competition. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO ‘11). 211–218.
  • Mahlmann et al. (2012) Tobias Mahlmann, Julian Togelius, and Georgios N Yannakakis. 2012. Evolving Card Sets towards Balancing Dominion. In 2012 IEEE Congress on Evolutionary Computation. IEEE, 1–8.
  • Mouret (2011) Jean-Baptiste Mouret. 2011. Novelty-based Multiobjectivization. In New Horizons in Evolutionary Robotics: Extended Contributions from the 2009 EvoDeRob Workshop. 139–154.
  • Preuss et al. (2018) Mike Preuss, Thomas Pfeiffer, Vanessa Volz, and Nicolas Pflanzl. 2018. Integrated Balancing of an RTS Game: Case Study and Toolbox Refinement. In 2018 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, 1–8.
  • Pugh et al. (2016) Justin K. Pugh, Lisa B. Soros, and Kenneth O. Stanley. 2016. Quality Diversity: A New Frontier for Evolutionary Computation. Frontiers in Robotics and AI 3 (2016), 40.
  • Santos et al. (2017) André Santos, Pedro A Santos, and Francisco S Melo. 2017. Monte Carlo Tree Search Experiments in Hearthstone. In Computational Intelligence and Games (CIG), 2017 IEEE Conference on. IEEE, 272–279.
  • Stiegler et al. (2017) Andreas Stiegler, Keshav Dahal, Johannes Maucher, and Daniel Livingstone. 2017. Symbolic Reasoning for Hearthstone. IEEE Transactions on Computational Intelligence and AI in Games (2017).
  • Stiegler et al. (2016) Andreas Stiegler, Claudius Messerschmidt, Johannes Maucher, and Keshav Dahal. 2016. Hearthstone deck-construction with a utility system. In Software, Knowledge, Information Management & Applications (SKIMA), 2016 10th International Conference on. IEEE, 21–28.
  • Świechowski et al. (2018) Maciej Świechowski, Tomasz Tajmajer, and Andrzej Janusz. 2018.

    Improving Hearthstone AI by Combining MCTS and Supervised Learning Algorithms. In

    2018 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, 1–8.
  • Togelius and Schmidhuber (2008) Julian Togelius and Jürgen Schmidhuber. 2008. An experiment in automatic game design.. In CIG. 111–118.
  • Volz et al. (2016) Vanessa Volz, Günter Rudolph, and Boris Naujoks. 2016. Demonstrating the Feasibility of Automatic Game Balancing. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference. ACM, 269–276.
  • Zhang and Buro (2017) Shuyi Zhang and Michael Buro. 2017. Improving Hearthstone AI by Learning High-level Rollout Policies and Bucketing Chance Node Events. In Computational Intelligence and Games (CIG), 2017 IEEE Conference on. IEEE, 309–316.
  • Zook et al. (2015) Alexander Zook, Brent Harrison, and Mark O Riedl. 2015. Monte-Carlo Tree Search for Simulation-based Strategy Analysis. In Proceedings of the 10th Conference on the Foundations of Digital Games.
  • Zook and Riedl (2018) Alexander Zook and Mark Riedl. 2018. Learning How Design Choices Impact Gameplay Behavior. IEEE Transactions on Games (2018).