Design of Emergent and Adaptive Virtual Players in a War RTS Game

01/15/2012 ∙ by Jose A. García Gutiérrez, et al. ∙ University of Malaga 0

Basically, in (one-player) war Real Time Strategy (wRTS) games a human player controls, in real time, an army consisting of a number of soldiers and her aim is to destroy the opponent's assets where the opponent is a virtual (i.e., non-human player controlled) player that usually consists of a pre-programmed decision-making script. These scripts have usually associated some well-known problems (e.g., predictability, non-rationality, repetitive behaviors, and sensation of artificial stupidity among others). This paper describes a method for the automatic generation of virtual players that adapt to the player skills; this is done by building initially a model of the player behavior in real time during the game, and further evolving the virtual player via this model in-between two games. The paper also shows preliminary results obtained on a one player wRTS game constructed specifically for experimentation.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction and related work

In an era in which computing power has boosted the graphical quality of videogames, players have turned their attention to other aspects of the game. In particular, they mostly request opponents exhibiting intelligent behavior. However, intelligence does not generally equate to playing proficiency but rather to interesting behaviors [liden:artifical-stupidity-game-wisdom2-2004]

. This issue is specifically relevant in real-time strategy (RTS) games that often employ two kinds of artificial intelligence (AI)

[ahlquist+:game_ai08]: one represented by a virtual player (VP, or non-player character – NPC) making decisions on a set of units (i.e., warriors, machines, etc.), and another one corresponding to the small units (usually with little or no intelligence). The design of these AIs is a complex task, and the reality in the industry is that in most of the RTS games, the NPC is basically controlled by a fixed script that has been previously programmed based on the experience of the designer/programmer. This script often comprises hundreds of rules, in the form if the game is in state S then Unit U should execute action A, to control the behavior of the components (i.e., units) under its control. This arises known problems: for instance, for players, the opponent behavior guided by fixed scripts can become predictable for the experienced player. Also, for AI programmers, the design of virtual players (VPs) can be frustrating because the games are becoming more and more complex with hundreds of different situations/states and therefore it is not easy to predict all the possible situations that could potentially happen and even more difficult to decide which is the most appropriate actions to take in such situations. As consequence, many RTS games contain ’holes’ in the sense that the game stagnates or behaves incorrectly under very specific conditions (these problems rely on the category of ’artificial stupidity’ [liden:artifical-stupidity-game-wisdom2-2004]). Thus the reality of the simulation is drastically reduced and so too the interest of the player;

In addition, there are very interesting research problems in developing AI for Real-Time Strategy (RTS) games including planning in an uncertain world with incomplete information, learning, opponent modeling, and spatial and temporal reasoning [buro:AAAI2004]. This AI design is usually very hard due to the complexity of the search space (states describe large playing scenarios with hundreds of units simultaneously acting). Particular problems are caused by the large search spaces (environments consisting of many thousands of possible positions for each of hundreds, possibly thousands, of units) and the parallel nature of the problem - unlike traditional games, any number of moves may be made simultaneously [DBLP:conf/gameon/CorrubleMR02]. Qualitative spatial reasoning (QSR) techniques can be used to reduce complex spatial states (e.g., using abstract representations of the space [DBLP:journals/expert/ForbusMD02]). Regarding evolutionary techniques, a number of biologically-inspired algorithms and multi-agent based methods have already been applied to handle many of the mentioned problems in the implementation of RTS games [DBLP:journals/tec/LouisM05, DBLP:journals/tec/StanleyBM05, DBLP:conf/cig/Livingstone05, miles+:co-evol-rts2006, DBLP:conf/evoW/LichockiKJ09, beume+:cig2008, keaveney+:cig2009, DBLP:conf/aiide/HagelbackJ09].

Even in the case of designing a very good script for VPs, the designer has to confront another well known problem: the VP behavior is usually fixed and rarely adapts to the level (i.e., skills) of the player. In fact the player can lose interest in the game because either she is able to beat all the opponents in each level, or the virtual player always beat her. In this sense, the design of interesting (e.g., non-predictable) NPCs is not the only challenge though. It also has to adapt to the human player, since she might lose interest in the game otherwise (i.e., if the NPCs are too easy or too hard to beat).

There are many benefits attempting to build adaptive learning AI systems which may exist at multiple levels of the game hierarchy, and which evolve over time. This paper precisely deals with the issue of generating behaviors for the virtual player that evolve in accordance with the player’s increasing abilities. This behavior emerges according to the player skill, and this emergent feature can make a RTS game more entertaining and less predictable in the sense that emergent behavior is not explicitly programmed but simply happens [sweetser:emergence_in_games2008]

. The attainment of adjustable and emergent virtual players consists here of a process of two stages that are iteratively executed in sequence: (1) a behavior model of the human player is created in real time during the execution of a game, and further the virtual player is evolved off-line (i.e., in between two games) via evolutionary algorithms till a state that it can compete with the player (not necessarily to beat her but to keep her interest in the game). This approach has been applied on a RTS game constructed specifically for experimentation, and we report here our experience.

2 The Game

Here we describe the basic elements that compose our wRTS game.

Scenario. It will also be called indistinctly region or world and consists of a two-dimensional non-toroidal heterogeneous hostile and dynamic grid-world. The world is heterogeneous because the terrain is not uniform, hostile because there exist a virtual army whose mission is to destroy the human player-controlled army, and dynamic because the game conditions change depending on the actions taken by units of both armies. Regarding the heterogeneity of the terrain, each grid in the region can have one of the three following values: passable (each uni can traverse this grid), impassable (no unit can traverse it), and semi-impassable (there is a penalization of 1 point of energy - see below -to traverse it).

Army. There are two armies (also called indistinctly teams, represented by spiders and ladybirds) with a number of units (also called indistinctly soldiers or agents) and a flag to be defended. One army is controlled by the human player whereas the other one is guided by some kind of artificial intelligence (AI).

Given a unit placed in a position in the grid, its visual range (VR) embraces any position that is placed to a maximum distance , that is to say, . Initially, each agent only knows the grids that belongs to its visual range. In fact, during the game the scenario is not completely known for an army and only those regions that were already visualized by its agents are known. The global information that each army (independently if this is controlled by the AI or the player) has is the sum of all the information of their constituent soldiers. Note that (human or virtual) players only know the rival flag position if this has been detected in a grid by some of its agents previously. Also, each unit interacts with the environment by executing a number of actions (see below) and has certain level of health and energy that decreases with these interactions. The initial values for the health and energy of each soldier are 100 and 1000 respectively.

Fights. Full body combat can be executed at the soldier level by assigning a random value (in the interval [0,1]) to each unit involved in a fight with a rival soldier; the energy level of the unit with lowest value decreases 1 unit. A unit dies when its energy is zero. All the combats between soldiers are executed in sequence in one turn of the game.

Decision-making. Along the game, the role of players is to make decisions with respect to the actions that the units of their respective associated armies should take. More specifically, both the virtual and human player can select a set of agents and further assign it a specific order that basically consists of a common action that each unit in this set has to execute.

Actions. The game is executed in turns; every turn, each soldier executes the last order received by its player. Six actions are possible in this game:

  • Move forward enemy: if applied to unit , then this has to move towards the closer rival soldier according to , otherwise (i.e., if no rival soldier exists in ) agent has to move in direction to the region where there are more enemies according to the global information of the army.

  • Group/Run away: the soldier has to group with other team mates placed inside its visual range; if there are not mates in this range, then the agent has to move towards the position where its flag is placed.

  • Move forward objective: the agent advances towards the opponent’s flag position if this is known, otherwise it moves randomly.

  • No operation: execute no operation.

  • Explore: the soldier moves towards a non-explored-yet region of the map.

  • Protect flag: The soldier has to move towards the position of its own flag; if it is already near to it, unit just should scout the zone.

Perceptions. Each unit knows its specific situation in the game from the perceptions that receives in its visual range. In our game, the agent state is determined by its health (measured in three range values: low, medium and high), and the following boolean perceptions:

  1. Advantage state? (): “the unit is into a handicap situation”. This perception is true if, according to the information in its visual range, the number of mate soldiers is higher than the number of rival units, and false otherwise.

  2. Under attack? (): this perception is true of the soldier is involved in a full body combat with another opponent unit, and false otherwise.

  3. Objective visible? (): the opponent’s flag is visible in its visual range.

Game objective. The army that first captures the rival flag is the winner; if this situation is never reached after a number of turns, then the winner is the player whose army inflicts a higher damage to its rival army and where the damage is measured as number of dead units.

2.1 Notes on specific issues

Observe that ‘fighting’ is not a specific action to be executed by the army units. This should not be surprising because ‘fighting’ is not usually controlled by players in standard RTS games as this action should often is executed at the unit level and not at the army level; in our game this action is automatically executed when a unit and a rival soldier meet in any grid of the scenario.

Also, note that five of the six possible actions that units can execute require to make movements in the game world. This basically means that some kind of pathfinding should be processed to do the movement realistic. Classical algorithms such as A were not good candidates for this task since, as already indicated, most of the scenario is not known for the army units. A practical solution was to let units make a good-enough movement depending on its scenario knowledge; several cases were covered: Concave obstacles: the unit moves by considering different angles with respect to its position in the scenario; Convex obstacles: the unit moves by following the obstacle contour in a specific direction; and Bridges and narrow passages: a simple ant-colony algorithm was executed.

Due to space limitations we will not explain them in details as this is beyond the objective of this paper. In any case, Figures 1(a),(b) and (c) show illustrative examples of the cases, Figure 1(d) illustrates the classical behavior of our ant colony algorithms, and Figure 1(e) displays a screenshot of our wRTS game111A prototype version of this game can be unloaded from titled aracnia. Note also that a description of the technical issues of the game (e.g., implementation of basic actions, graphical considerations, input/output interface, scene rendering and memory management, among others) is beyond the scope of this paper. In the following we will focus on the process of designing virtual players.

(a) Concave obstacles
(b) Convex obstacles
(c) bridges and passages
(d) Ant-colony
(e) Game
Figure 1: (a)-(c) Movement examples; (d) illustration of classical execution of ant-colony algorithm used for pathfinding; (e) game screenshot (black cells represent non-explored regions yet, blue cells impassable grids, and green cells passable regions in the scenario.

3 Virtual player design

Here we describe our proposal to design adaptive virtual players in our wRTS.

3.1 General issues

Our aim is to generate controllers governing automatically (i.e., without human intervention) the behavior of an entire team (i.e., a set of autonomous agents/soldiers). An army controller can be viewed as a set of rules that will control the reactions of its agents according to the current state of play. Depending on its situation, the agent executes a particular action that may modify the state of the agent itself. The global team strategy emerges as the result of the sum of the specific behavior of all the agents.

However, the definition of specific strategies for each agent is complex, requires a profound knowledge of the required behavior of each agent, and results in predictable behavior. Moreover, devising independent controllers for each unit in a RTS game is costly; in fact this is not realistic as requires a high computation effort that surely decreases the quality of the real-time rendering of the game. One solution consists of designing global team strategies; however, again this is very complex as the programmer has to cope with too many possibilities that arise from the different interactions among all the agents of both armies. We have opted by a more realistic process: all the soldiers will be managed by the same controller, and in any instant of the game, the team behavior will be the result of summing all the action taken by each of its constituent units. This means that we have to produce just one controller to devise a global team strategy. Note however that this does not mean all the agents execute the same action because the action to be executed by a soldier will depend on its particular situation in the match.

3.2 A two phases process

The procedure described in the following has the aim of creating a virtual player whose behavior auto-evolves according to the skills that human player exhibits along the game. Algorithm 1 displays the general schema of the process: initially (in the first game), the virtual player (VP) is constructed as an expert system (i.e., a rule-based prototype (RBP)) that comprises tens of rules trying to cover all the possible situations in which an agent might be involved. This system was the result of a number of trials designed from the experiences of the authors playing wRTS games. Then, let be henceforth, and assume the human player will play a number of games, the procedure consists of two phases that are sequentially executed. Firstly, a player modeling phase is conducted; this step is described in Section 3.3 and basically consists of building a model of the behavior that human player exhibits during the game (i.e., on-line). The second phase, described in Section 3.4, is devoted to construct the virtual player by evolving a strategy controlling the functioning of a virtual army unit via an evolutionary algorithm (EA). In the following we describe in details both phases. Firstly, we discuss how the virtual player is internally encoded as this is important to understand the functioning of the whole schema.

VP RBP;  // Rule-based expert system
1 for  do
          PlayerModel PlayGameOn(VP);  // Player modeling (PM)
          VP EA(PlayerModel,VP);  // Evolutionary optimization (EA)
3 end for
4return VP;
Algorithm 1 PMEA (VP)


the virtual player contains a NPC army strategy, and its internal encoding corresponds with both the rule-based expert system and any individual in the EA population. In the following we will refer to ‘individual’ in a general sense. Every individual represents a team strategy, that is to say, a common set of actions that each army unit has to execute under its specific situation in the environment. An individual is represented as a vector

of cells where is the number of different situations in which the agent can be in the game, and (for ) contains the action to be taken in situation . In other words, assuming perceptions, where the perception (for ) can have possible values, then and the cell:

contains the action to be executed by a specific unit when its perceptions have the values respectively. As indicated in Section 2, the state of a unit in our wRTS game is determined by three boolean perceptions and its energy level (with 3 different values) so that the encoding of an individual consists of a vector with 24 genes; this vector will be called answer matrix, and each cell in the vector will contain an action. Remember that 6 possible actions can be executed in our wRTS and thus the search space (i.e., the number of different strategies that can be generated from this representation) is . Figure 2(left) displays an example of a possible encoding. The optimal solution (if it exists) would be that strategy which always select the best action to be executed for the agents under all possible environmental conditions. In fact, this vast search space makes this problem impracticable for exact methods and justifies the use of evolutionary algorithms.

Figure 2: Example of an arbitrary encoding (left), and extended answer matrix (right).

3.3 On-line phase: User modeling

The player behavior will be modeled as a ruled-based strategy encoded as explained above. This process requires collecting, during the execution of the game (i.e., in real time) the actions that human player executes, recording additionally the specific conditions under which these are performed. At the end of this process we generate an extended answer matrix () that is an answer matrix (i.e., an individual encoding ) where each cell (for ) represents now a vector of 6 positions (one per action) and (for some action

) contains the probability that human player executes action

under the environment conditions (i.e., the state) associated to cell . Figure 2(right) displays an example that shows the probability of executing each of the 6 possible actions in a specific situation of the unit (i.e., the soldier has medium energy, is in an advantage state, is not suffering an attack, and knows where the opponent flag is placed). This extended answer matrix is finally used to design the virtual player as follows: , for all possible situations .

3.4 Off-line phase: evolutionary optimization

Algorithm 2 shows the basic schema of our EA. The initial population is randomly generated except one individual that is charged with the virtual player (lines 1-4). Then a classical process of evolution (lines 6-21) is performed and the best solution in the population is finally returned (draws are broken randomly).

1 for  do
2          Random-Solution();
4 end for
6 ;
7 while  do
          Rank-Population (); // sort population according to fitness
          Select (); // roulette wheel
8          Select ();
9          if  then // recombination is done
10                   Recombine ();
12          else
13                   ;
15          end if
          Mutate (); // = mutation probability
16          Mutate ();
17          ;
18          ;
          ; // (popsize + 2) replacement
19          ;
21 end while
22return best solution in ;
Algorithm 2 EA(PModel, VP)) // PModel = Player Model, VP = Virtual Player

Evaluating the fitness of an individual requires the simulation (off-line) of the game between the player model and the virtual player strategy encoded in . The fitness function depends on the statistical data collected at the end of the simulation. The higher the number of statistical data to be collected, the higher the computational cost will be. A priori, a good policy is to consider a limited number of data. Four data were used in our experiments during off-line evaluation: A: Number of deaths in the human player army; B: number of deaths in virtual player army; C: number of movements; and D: victory degree (i.e., 1 if virtual player wins and 2 otherwise). Fitness function was then defined as ; higher the fitness value, better the strategy. This fitness was coded to evolve towards aggressive solutions.

4 Experimental analysis

The experiments were performed using two algorithms: our initial expert system (RBP), and the algorithm PMEA (i.e., player modeling + EA) presented in Algorithm 1. As to the PMEA, the EA uses , , , and MaxGenerations = 125; mutation is executed as usual at the gene level by changing an action to any other action randomly chosen. Three different scenarios where created for experimentation: (1) A map with size grids, 48 agents in VP army, and 32 soldiers in the human player (HP) team; (2) a map , with 43 VP soldiers, and 43 HP units; and (3) a map , with 48 VP soldiers, and 53 HP units. Algorithm 1 was executed for a value of (i.e., 20 different games were sequentially played), and the RBP was also executed 20 times; Table 1 shows the results obtained.

RBP 4 16 6 7 5345 3.56
PMEA 6 14 7 7 4866 3.20
RBP 9 11 4 3 7185 4.79
PMEA 7 13 6 7 5685 3.80
RBP 3 17 3 2 6946 4.63
PMEA 6 14 7 6 6056 3.78
Table 1: Results: = number of virtual player’s victories, = number of human player’s victories, = average number of deaths in the HP army, = average number of deaths in the VP army, = average number of movements, and = average time (minutes) dedicated per game

Even though in two of the three scenarios PMEA behaves better than RBP, note that no significant differences are shown; this is however an expected result as we have considered just one player what means that the player models obtained in-between two games are likely similar and thus their corresponding virtual players also are. In any case, this demonstrates that our approach is feasible as it produces virtual players comparable - and sometimes better - to specific and specialized pre-programmed scripts.

5 Conclusions

We have described an algorithm to design automatically strategies exhibiting emergent behaviors that adapt to the user skills in a one-player war real time strategy game (wRTS); to do so, a model to mimic how the human player acts in the game is first constructed during the game, and further a strategy for a virtual player is evolved (in between games) via an evolutionary algorithm.

Our proposal was compared with an expert system designed specifically for the game. Whereas no significance differences have been highlighted in the experiments, we make note that our approach has evident advantages compared to classical manufactured scripts (i.e., expert systems) used in videogame industry: for instance, it avoids the predictability of actions to be executed by the virtual player and thus guarantees to maintain the interest of the player This is specially interesting when the game involves more than one player as our approach would allow to construct virtual players adapted particularly to each of the human players (and this cannot be obtained with a pre-programmed script).

Further research will cope with multi-player games and thus multi-objective evolutionary programming techniques should be considered.


This work is supported by project NEMESIS (TIN-2008- 05941) of the Spanish Ministerio de Ciencia e Innovación, and project TIC-6083 of Junta de Andalucía.