Since the beginning of the development of AI systems, games have been natural benchmarks for AI algorithms because they provide well-defined rules and allow for easy evaluation of the agent’s performance. The number of games solved by AI algorithms has increased rapidly in recent years, and algorithms like AlphaGo [29, 30] beat the best human players in more and more complex board games that have been previously deemed too sophisticated for computers. We have also witnessed major successes in applying Deep Reinforcement Learning to play arcade games [21, 20, 25], for some of which machines, yet again, surpass humans. However, AI agents faced with complex first-person-perspective, 3D environments do not yet come even close to human performance. The disparity is most striking when simultaneous use of multiple skills is required, e.g., navigation, localization, memory, self-awareness, exploration, or precision. Obtaining these skills is particularly important considering the potential applicability of self-learning systems for robots acting in the real world.
Despite the limited real-world applicability still, a large body of the research in reinforcement learning has concentrated on 2D Atari-like games and abstract classical games. This is caused, in part, by the scarcity of suitable environments and established benchmarks for harder environments.
Introduced in early 2016, Doom-based ViZDoom 
was the first published environment that aimed to provide a complex 3D first-person perspective platform for Reinforcement Learning (RL) research. ViZDoom was created as a response to the sparsity of studies on RL in complex 3D environments from raw visual information. The flexibility of the platform has led to dozens of research works in RL that used it as an experimental platform. It has also triggered the development of other realistic 3D worlds and platforms suitable for machine learning research, which have appeared since the initial release of ViZDoom, such as Quake-based DeepMind’s Lab and Minecraft-based Project Malmo , which follow similar principles as ViZDoom. Related are also environments that focus on the task of house navigation using raw visual information with very realistic renderers: House3D , AI2-THOR , HoME , CHALET , UnrealCV , however, most of this environments focus only on that particular task and lack extensibility and flexibility.
An effective way to promote research in such complex environments as ViZDoom is to organize open competitions. In this paper, we describe the first two editions of the Visual Doom AI Competition (VDAIC) that were held during the Conference on Computational Intelligence and Games 2016 and 2017. The unique feature of this annual competition is that the submitted bots compete in multi-player matches having the screen buffer as the only source of information about the environment. This task requires effective exploration and navigation through a 3D environment, gathering resources, dodging missiles and bullets, and, last but not least, accurate shooting. Such setting also implies sparse and often delayed rewards, which is a challenging case for most of the popular RL methods. The competition seems to be one of a kind, for the time, as it combines 3D vision with the multi-player aspect. The competition can be termed as belonging to ‘AI e-sport’, trying to merge the trending e-sports and events such as the driverless car racing.
Ii The ViZDoom Research Platform
Ii-a Design Requirements
The ViZDoom reinforcement learning research platform has been developed to fulfill the following requirements:
based on popular open-source 3D FPS game (ability to modify and publish the code),
lightweight (portability and the ability to run multiple instances on a single machine with minimal computational resources),
fast (the game engine should not be the learning bottleneck by being capable of generating samples hundreds or thousands of times faster than real-time),
total control over the game’s processing (so that the game can wait for the bot decisions or the agent can learn by observing a human playing),
customizable resolution and rendering parameters,
multi-player games capabilities (agent vs. agent, agent vs. human and cooperation),
easy-to-use tools to create custom scenarios,
scripting language to be able to create diverse tasks,
ability to bind different programming languages (preferably written in C++),
In order to meet the above-listed criteria, we have analyzed several recognizable FPS game engines: Doom, Doom 3, Quake III, Half-Life 2, Unreal Tournament and Cube. Doom (see Fig. 1) with its low system requirements, simple architecture, multi-platform and single and multi-player modes met most of the conditions (see  for a detailed analysis) and allowed to implement features that would be barely achievable in other game engines, e.g., off-screen rendering, efficiency, and easy-to-create custom scenarios. The game is highly recognizable and runs on the three major operating systems. It was also designed to work in resolution and despite the fact that modern implementations allow higher resolutions, it still utilizes low-resolution textures, which positively impacts its resource requirements.
The nowadays unique feature of Doom is its software renderer. This is especially important for reinforcement learning algorithms, which are distributed on CPUs rather than on GPUs. Yet another advantage of the CPU rendering is that Doom can effortlessly be run without a desktop environment (e.g., remotely, in a terminal) and accessing the screen buffer does not require transferring it from the graphics card.
Technically, ViZDoom is based on the modernized, open-source version of Doom’s original engine, ZDoom111zdoom.org, which has been actively supported and developed since 1998. The large community gathered around the game and the engine has provided a lot of tools that facilitate creating custom scenarios.
ViZDoom provides features that can be exploited in a wide range of AI and, in particular, machine learning experiments. It allows for different control modes, custom scenarios and access to additional information concerning the scene, including per-pixel depth (depth buffer), visible objects, and a top-down view map. In the following sections, we list the most important features of ViZDoom 1.1.5, which substantially extend the features of the initial 1.0 version .
Ii-B1 Control Modes
ViZDoom provides four control modes: i) synchronous player, ii) synchronous spectator, iii) asynchronous player, and iv) asynchronous spectator.
In asynchronous modes, the game runs at constant frames per second and if the agent reacts too slowly, it can miss one or more frames. Conversely, if it makes a decision too quickly, it is blocked until the next frame arrives from the engine. Thus, for the purpose of reinforcement learning research, it is more efficient to use synchronous modes, in which the game engine waits for the decision maker. This way, the learning system can learn at its pace, and it is not limited by any temporal constraints.
Importantly, for experimental reproducibility and debugging purposes, the synchronous modes run fully deterministically.
In the player modes, it is the agent who makes actions during the game. In contrast, in the spectator modes, a human player is in control, and the agent only observes the player’s actions.
ViZDoom provides support both for single- and multi-player games, which accept up to sixteen agents playing simultaneously on the same map communicating over a network. As of the 1.1 version of ViZDoom, multi-player games can be run in all modes (including the synchronous ones). Multi-player can involve deathmatch, team deathmatch, or fully cooperative scenarios.
One of the most important features of ViZDoom is the ability to execute custom scenarios, which are not limited to just playing Doom. This includes creating appropriate maps (“what the world looks like”), programming the environment’s mechanics (“when and how things happen”), defining terminal conditions (e.g., “killing a certain monster”, “getting to a certain place”, “getting killed”), and rewards (e.g., for “killing a monster”, “getting hurt”, “picking up an object”). This mechanism opens endless experimentation possibilities. In particular, it allows researchers to create scenarios of difficulty on par with capabilities of the state-of-the-art learning algorithms.
Creation of scenarios and maps is possible thanks to easy-to-use software tools developed by the Doom community. The two recommended free tools are Doom Builder 2222http://www.doombuilder.com and SLADE 3333http://slade.mancubus.net. Both visual editors make it easy to define custom maps and coding the game mechanics in Action Code Script for both single- and multi-player games. They also enable to conveniently test a scenario without leaving the editor.
While any rewards and constraints can be implemented by using the scripting language, ViZDoom provides a direct way for setting most typical kinds of rewards (e.g., for “living” or “dying”), constraints regarding the elementary actions/keys that can be used by agent, or temporal constraints such as the maximum episode duration. Scenarios do not affect any rendering options (e.g., screen resolution, or the crosshair visibility), which can be customized in configuration files or via the API.
ViZDoom comes with more than a dozen predefined scenarios allowing for the training of fundamental skills like shooting or navigation.
Ii-B3 Automatic Objects Labeling
The objects in the current view can be automatically labeled and provided as a separate input channel together with additional information about them (cf. Fig. 2
). The environment also provides access to a list of labeled objects (bounding boxes, their names, positions, orientations, and directions of movement). This feature can be used for supervised learning research.
Ii-B4 Depth Buffer
ViZDoom provides access to the renderer’s depth buffer (see Fig. 2), which may be used to simulate the distance sensors common in mobile robots and help an agent to understand the received visual information. This feature gives an opportunity to test whether an agent can autonomously learn the whereabouts of the objects in the environment.
Ii-B5 Top-Down View Map
ViZDoom can also render a top-down representation (a map) of the episode’s environment. The map can be set up to display the entire environment or only the part already discovered by the agent. In addition, it can show objects including their facings and sizes. This feature can be used to test whether the agents can efficiently navigate in a complex 3D space, which is a common scenario in mobile robotics research. Optionally, it allows turning ViZDoom into a 2D environment, which eliminates the need for an auxiliary 2D environment for simpler scenarios.
Ii-B6 Off-Screen Rendering and Frame Skipping
To facilitate computationally heavy machine learning experiments, ViZDoom is equipped with an off-screen rendering and frame skipping features. The off-screen rendering decreases the performance burden of displaying the game on the screen and makes it possible to run experiments on the servers (no graphical interface required). Frame skipping, on the other hand, allows omitting to render some frames completely since typically an effective bot does not need to see every single frame.
Ii-B7 ViZDoom’s Performance
Our performance tests show that ViZDoom can render up to frames per second on average in most of the scenarios in and even frames per second in the low-resolution of on a modern CPU (single threaded).
The main factors affecting the ViZDoom performance are the rendering resolution, and computing the additional buffers (depth, labels, and the top-down view map). In the case of low resolutions, the time needed to render one frame is negligible compared to the backpropagation time of any reasonably complex neural network. It is also worth mentioning that one instance of ViZDoom requires only a dozen MBs of RAM, which allows running many instances simultaneously.
Ii-B8 Recording and replaying episodes
Last but not least, the ViZDoom games can be effortlessly recorded and saved to disk to be later replayed. During playback, all buffers, rewards, game variables, and the player’s actions can be accessed just like in the spectator mode, which becomes useful for learning from demonstration. Moreover, the rendering settings (e.g., the resolution, textures, etc.) can be changed at the replay time. This is useful for preparing high-resolution demonstration movies.
Ii-C Application Programming Interface (API)
ViZDoom API is flexible and easy-to-use. It was designed to conveniently support reinforcement learning and learning from demonstration experiments, and, therefore, it provides full control over the underlying Doom process. In particular, it is possible to retrieve the game’s screen buffer and make actions that correspond to keyboard buttons (or their combinations) and mouse actions. Some of game state variables such as the player’s health or ammunition are available directly.
The ViZDoom’s API was written in C++. The API offers a myriad of configuration options such as control modes and rendering options. In addition to the C++ support, bindings for Python, Lua, Java and Julia have been provided. A sample code using the Python API with a randomly behaving bot is shown in Fig. 3.
Iii Visual RL Research Platforms
The growing interest in Machine Learning and Reinforcement Learning that had given rise to ViZDoom has recently triggered development of a number of other environments designed for RL experimentations.
DeepMind Lab  and Project Malmo  are the closest counterparts of ViZDoom since they both involve the first-person perspective. Similarly to ViZDoom, they are based on popular games. Project Malmo has been developed on top of Minecraft while DeepMind Lab uses Quake 3 Arena, which was also considered for the ViZDoom project. It has been, however, rejected due to limited scripting capabilities and a developer unfriendly server-client architecture, which also limits the performance of the environment. DeepMind Lab vastly extends scripting capabilities of Quake and adds custom resources for the project that overhauls the look of the environment, which makes DeepMind Lab more detached from its base game compared to ViZDoom.
All three platforms allow for defining custom scenarios. Project Malmo, is particularly flexible in this respect, giving a lot of freedom by providing a full access to the state of the environment during runtime. Unfortunately, in contrast to ViZDoom, there is no visual editor available for Project Malmo. DeepMind Lab’s scripting allows for building complex environments on top of it (e.g., Psychlab ). For creating custom geometry, original Quake 3 Arena visual editor can be adopted but it is burdensome in use. On the other hand, ViZDoom is compatible with accessible Doom’s community tools that allow for creating levels and scripting them. However, due to the engine limitations, creating some types of scenarios may require more work compared to the two previous environments (e.g., scenarios with a lot of randomized level geometry).
All three platforms provide an RL friendly API in a few programming languages and provide depth buffer access but only ViZDoom makes it possible to obtain the detailed information about objects visible on the screen, the depth buffer, and offers a top-down view map.
is yet another interesting project that offers a high-quality 3D rendering. It provides an API for Unreal Engine 4, that enables users to obtain the environment state, including not only the rendered image but also a whole set of different auxiliary scene information such as the depth buffer or the scene objects. UnrealCV is not a self-contained environment with existing game mechanics and resources, it must be attached to an Unreal Engine-based game. This characteristic allows creating custom scenarios as separate games directly in Unreal Engine Editor using its robust visual scripting tools. However, since UnrealCV is designed to be a universal tool for computer vision research, it does not provide any RL-specific abstraction layer in its API.
Unity ML-Agents444https://unity3d.com/machine-learning is the most recent project, which is still in development (currently in beta). It follows a similar principle as UnrealCV providing a Python API for scenarios that can be created using the Unity engine. However, like Project Malmo, it aims to be a more general RL platform with a flexible API. It allows a user to create a wide range of scenarios, including learning from a visual information.
While the ViZDoom graphic is the most simplistic among all major first-person perspective environments, it also makes it very lightweight, allowing to run multiple instances of the environment using only the small amount of available computational resources. Among the available environments, it is the most computationally efficient, which is an important practical experimentation aspect. A detailed comparison of environments can be found in Table I.
|Feature||ViZDoom||DeepMind Lab||Project Malmo||OpenAI Universe||UnrealCV||Unity ML-Agents|
|Base Game/Engine||Doom/ZDoom||Quake 3/ioquake3||Minecraft||Many||Unreal Engine 4||Unity|
|Public release date||March 2016||December 2016||May 2016||December 2016||October 2016||September 2017|
|API languages||Python, Lua, C++, Java, Julia||Python, Lua||Python, Lua, C++, C#, Java||Python||Python, MatLab||Python|
|Game customization capabilities||✓||✓||Limited||-||✓||✓|
|Scenario editing tools||Visual Editors||Text + Visual Editor||XML defined||-||Unreal Engine 4 Editor||Unity Editor|
|Framerate at||2500 (CPU)||100 (CPU)/800 (GPU)||120 (GPU)||60 (GPU, locked)||Depends||Depends|
|Auto object labeling||✓||✓|
|Low level API||✓||✓||✓||✓|
|RL friendly abstraction layer||✓||✓||✓||✓||✓|
– platform allows creating scenarios with varying graphical level and thus requirements, – platform is open-source, however code of the most base games are closed, – platfrom is open-source, however Unity engine code is closed.
Iv Visual Doom AI Competitions (VDAIC)
Doom has been considered one of the most influential titles in the game industry since it had popularized the first-person shooter (FPS) genre and pioneered immersive 3D graphics. Even though more than 25 years have passed since Doom’s release, the methods for developing AI bots have not improved qualitatively in newer FPS productions. In particular, bots still need to “cheat” by accessing the game’s internal data such as maps, locations of objects and positions of (player or non-player) characters and various metadata. In contrast, a human can play FPS games using a computer screen as the sole source of information, although the sound effects might also be helpful. In order to encourage development of bots that act only on raw visual information and to evaluate the state of the art of visual reinforcement learning, two AI bot competitions have been organized at the IEEE Conference on Computational Intelligence and Games 2016 and 2017.
Iv-B Other AI competitions
There have been many game-based AI contests in the past . The recent AI competition examples include General Video Game (GVGAI) , Starcraft , Pac-Mac , and the Text-Based Adventure . Each of the competitions provides a different set of features and constraints.
Most of the contests give access to high-level representations of game states, which are usually discrete. VDAIC is uncommon here since it requires playing only and directly from raw high-dimensional pixel information representing a 3D scene (screen buffer).
Many competitions concentrate on planning. For instance, GVGAI provides an API that allows to sample from a forward model of a game. This turned the competition into an excellent benchmark for variants of Monte Carlo Tree Search.
The Starcraft AI competition shares the real-time aspect of VDAIC but, similarly to GVGAI, it focuses on planning basing on high-level state representations. This is reflected in the proposed solutions , which involve state search and hand-crafted strategies. Few competitions like Learning to Run  and the learning track of GVGAI target model-free environments (typical RL settings). However, both of them still provide access to relatively high-level observations.
It is apparent that the Visual Doom AI Competition has been unique and has filled a gap in the landscape of AI challenges by requiring bots to both perceive and plan in real-time in a 3D multi-agent environment.
Iv-C Edition 2016
The task of the competition was to develop a bot that competes in a multi-player deathmatch game with the aim to maximize the number of frags, which, by Doom’s definition, is the number of killed opponents decreased by the number of committed suicides (bot dies due to a damages inflicted by its own actions). The participants of the competition were allowed to prepare their bots offline and use any training method, external resources like custom maps, and all features of the environment (e.g., depth buffer, custom game variables, Doom’s built-in bots). However, during the contest, bots were allowed to use only the screen buffer (the left-most view in Fig. 2) and information available on the HUD such as ammunition supply, health points left, etc. Participants could configure the screen format (resolution, colors) and rendering options (crosshair, blood, HUD visibility, etc.). All features of the environment providing the bots information typically inaccessible to human players were blocked. The participants were allowed to choose between two sets of textures: the original ones and freeware substitutes.
For evaluation, the asynchronous multi-player mode was used (a real-time game with 35 frames per second). Each bot had a single computer at its exclusive disposal (Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, 16GB RAM with Nvidia GTX 960 4GB). Participants could choose either Ubuntu 16.04 or Windows 10 and provide their code in Python, C++, or Java.
The competition consisted of two independent tracks (see Section IV-C2). Each of them consisted of 12 matches lasting 10 minutes (2 hours of gameplay per track in total). Bots started every match and were respawned immediately after death at one of the respawn points, selected as far as possible from the other players. Additionally, bots were invulnerable to attacks for the first two seconds after a respawning.
Track 1: Limited Deathmatch on a Known Map
The agents competed on a single map, known in advance. The only available weapon was a rocket launcher, with which the agents were initially equipped. The map consisted mostly of relatively constricted spaces, which allow killing an opponent by hitting a nearby wall with a rocket (blast damage). For the same reason, it was relatively easy to kill oneself. The map contained resources such as ammunition, medikits, and armors. Due to the fact that the number of participants (9) exceeded the ViZDoom’s 1.0 upper limit of 8 players per game, a fair matchmaking scheme was developed. For each of the first 9 matches, a single bot was excluded. For the remaining 3 matches, 2 bots that had performed worst in the first 9 matches were excluded.
Track 2: Full Deathmatch on Unknown Maps
The agents competed four times on each of the three previously unseen maps (see Fig. 4), and were initially equipped with pistols. The maps were relatively spacious and contained open spaces, which made accurate aiming more relevant than in Track 1. The maps contained various weapons and items such as ammunition, medikits, and armors. A sample map was provided. All maps were prepared by authors of the competition.
Notice that Track 2 has been considerably harder than Track 1. During the evaluation, agents were faced with completely new maps, so they could not learn the environment by heart during the training as in Track 1. And while it is enough to move randomly and aim well to be fairly effective for Track 1, a competent player for Track 2 should make strategic decisions such as where to go, which weapon to use, explore or wait, etc.
The bots in 7 out of 9 submissions were competent enough to systematically eliminate the opponents. Among them, four bots stand out by scoring more than 300 frags: F1, Arnold, Clyde and TUHO. The difference between the second (Arnold) and the third (Clyde) place was minuscule (413 vs. 393 frags) and it is questionable whether the order remained the same if games were repeated. There is no doubt, however, that F1 was the best bot beating the forerunner (Arnold) by a large margin. F1 was also characterized by the least number of suicides. Note, however, that generally, the number of suicides is high for all the agents. Interestingly, despite the fact that F1 scored the best, it was Arnold who was gunned down the least often.
In Track 2, IntelAct was the best bot, significantly surpassing its competitors on all maps. Arnold, who finished in the second place, was killed the least frequently. Compared to Track 1, the numbers of kills and suicides (see Tables II and III) are significantly lower, which is due to less usage of rocket launchers that are dangerous not only for the opponent but also for the owner.
|Place||Bot||Total Frags||F/D ratio||Kills||Suicides||Deaths|
Iv-C4 Notable Solutions
|IntelAct||Tensorflow||Direct Future Prediction (DFP)|
|F1||Tensorflow + Tensorpack||A3C, curriculum learning|
|TUHO||Theano + Lasagne||DQN + Haar Detector|
|5vision||Theano + Lasagne||DARQN |
|Ivomi||Theano + Lasagne||DQN|
Table IV contains a list of bots submitted to the competition. Below, the training methods for the main bots are briefly described:
F1 (Yuxin Wu, Yuandong Tian) - the winning bot of Track 1 was trained with a variant of the A3C algorithm  with curriculum learning . The agent was first trained on an easier task (weak opponents, smaller map) to gradually face harder problems (stronger opponents, bigger maps). Additionally, some behaviors were hardcoded (e.g., increased movement speed when not firing) .
but instead of maximizing the expected reward, the agent tries to predict future measurement vector (e.g., health, ammunition) for each action based on the current measurement vector and the environment state. The agent’s actual goal is defined as a linear combination of the future measurements and can be changed on the fly without retraining. Except for weapon selection, which was hardcoded, all behaviors were learned from playing against bots on a number of different maps. The core idea of DFP is related to UNREAL as, in addition to the reward, it predicts auxiliary signals.
Arnold (Guillaume Lample, Devendra Singh Chaplot) took the second place in both tracks (two sets of parameters and the same code). The training algorithm  contained two modules: navigation, obtained with DQN, and aiming, trained with Deep Recurrent Q-Learning (DRQN 
). Additionally, the aiming network contains an additional output with a binary classifier indicating whether there is an enemy visible on the screen. Based on the classifier decision, either the navigation or the aiming network decided upon the next action. The navigation network was rewarded for speed and penalized for stepping on lava whereas the aiming network was rewarded for killing, picking up objects, and penalized for losing health or ammunition. It is worth noting that Arnold crouched constantly, which gave him a significant advantage over the opponents. This is reflected in his high frags to deaths ratio especially for Track 2 (see TablesII and III). Arnold was trained on a set of maps created by the solution’s authors, starting with maps full of weak enemies.
Clyde (Dino Ratcliffe), the bot that took the third place in Track 1, was trained with the vanilla A3C algorithm rewarded for killing, collecting items and penalized for suicides. For more details cf. .
TUHO (Anssi ‘Miffyli’ Kanervisto, Ville Hautamäki), the bot that took the third place in Track 2 was trained similarly to Arnold. TUHO used two independent modules. The navigation module was trained with a Dueling DQN , rewarded for speed while the aiming network featured a classical Haar detector  trained on manually labeled examples. If an enemy was detected in the frame, the agent turned appropriately and fired. The navigation network was used otherwise. TUHO was trained only on the two supplied maps and two other maps bundled with ViZDoom.
Although the top bots were quite competent and easily coped with the Doom’s built-in bots, no agent came close to the human’s competence level. The bots were decent at tactical decisions (aiming and shooting) but poor on a strategic level (defense, escaping, navigation, exploration). This was especially visible for Track 2, where the maps were larger and required a situational awareness – bots often circled around the same location, waiting for targets to appear.
No agent has been capable of vertical aiming. That is why Arnold, which hardcoded crouching, avoided many projectiles, achieving exceptionally a high frags/death ratio. Other bots, probably, had never seen a crouching player during their training and therefore were not able to react appropriately.
It was also observed that strafing (moving from side to side without turning) was not used very effectively (if at all) and agents did not make any attempts to dodge missiles – a feat performed easily by humans. Bots did not seem to use any memory as well, as they did not try to chase bots escaping from their field of view.
Most of the submitted agents were trained with the state-of-the-art (as of 2016) RL algorithms such as A3C and DQN but the most successful ones additionally addressed the problem of sparse, delayed rewards by auxiliary signals (IntelAct, Arnold) or curriculum training (F1).
Before the contest evaluation itself, three, optional warm-up rounds were organized to accustom the participants to the submission and evaluation process and give them a possibility to check their bots effectiveness against each other. The performance of solutions was tested on known maps and the results in the form of tabular data and video recordings were made publicly available.
The participants were supposed to submit their code along with a list of required dependencies to install. In terms of logistics, testing the submissions was a quite a strenuous process for our small team due to various compatibility issues and program dependencies especially for solutions employing less standard frameworks. This caused an enormous time overhead and created a need for more automated verification process. That is why it was decided that Docker containers  would be used for the subsequent edition of the competition, relieving the organizers from dealing with the dependency and compatibility issues.
Iv-D Edition 2017
Iv-D1 Changes Compared to the Edition 2016
The rules and logistics of the 2017 competition did not differ much from the ones of the previous edition. The changes were as follows:
The new version of ViZDoom (1.1.2) was used as the competition environment; all new features described in Section II were allowed for the training or preparation of the bots. The raised player limit (from 8 to 16) made it possible to fit all the submitted bots in a single game.
Each track consisted of 10 matches, each lasting 10 minutes.
Track 2 consisted of 5 previously unseen maps (see Fig. 5), each one was used for 2 matches. Maps were chosen randomly from four highly rated Doom multi-player map packs, each containing a number of maps varying from 6 to 32 (which gives, in total, 88 maps), that were selected from a more substantial list of map packs suggested by the ZDoom community. Thus, the selected maps were characterized by good quality and thoughtful design. They were also much smaller compared to the maps used in the Track 2 of the 2016 competition, leading to the more frequent interaction between the agents.
The respawn time (an obligatory waiting after death) of 10 seconds was introduced to encourage gameplay that also focuses on the survival instead of reckless killing and to limit the number of frags obtained on weaker bots.
The crouching action was disabled as it gives an enormous advantage over non-crouching players while being achievable by hardcoding of a single key press (which was implemented in one of 2016’s submissions). A situation in which an agent learned to crouch effectively on its own would arguably be an achievement but that was not the case.
Matches were initiated by a dedicated host program (for recording purposes) and all agent’s processes were run from Docker containers , submitted by the participants.
The winning bots of the 2016 competition were added to each track as baselines; they were also made available to the participants for training or evaluation.
In the previous edition of the competition, most of the participants did not join the warm-up rounds which made it even more difficult for the organizers and also participants to estimate quantity and quality of the final submissions. That is why in 2017, an obligatory elimination round was introduced. Only teams that had participated in the elimination rounds and presented sufficiently competent bots were allowed to enter the final round.
The results of the competition were shown in Tables V and VI for Track 1 and 2, respectively. For this edition of the competition, the engine was extended to extract additional information about agents performance, specifically: the average movement speed (given in km/h, assuming that 128 of game units correspond to 3 meters in the real world), the number of performed attacks, their average shooting precision and detection precision. The shooting precision was calculated as the number of attacks that did damage to an enemy (by a direct hit, a blast from an exploding rocket, or exploding barrel) divided by the number of all performed attacks. The detection precision is the number of attacks performed when another player was visible to the agent divided by the number of all performed attacks. The engine also counted the number of hits and damage taken (in game’s health points) by the agents and the number of picked up items. The statistics are presented in Tables VII and VIII.
The level of the submitted bots was significantly higher in 2017 than in 2016. There were no weak bots in Track 1. The spread of the frag count was rather small: the worst bot scored 109 while the best one 248. The track was won by Marvin, which scored 248 frags, only 3 more than the runner-up, Arnold2, and 33 more then Axon. Interestingly, Marvin did not stand out with his accuracy or ability to avoid rockets; it focused on gathering resources: medkits and armors, which greatly increased his chances of survival. Marvin was hit the largest number of times but, at the same time, it was killed the least frequently. Arnold2, on the other hand, was better at aiming (shooting and detection precision).
Notice also that F1, the winner of Track 1 of the previous competition, took the fifth place with 164 frags and is again characterized by the least number of suicides, which, in general, did not decrease compared to the 2016 competition and is still high for all the agents.
|Place||Bot||Total Frags||F/D ratio||Kills||Suicides||Deaths|
The level of bots improved also in Track 2. All of the bots scored more than 50 frags, which means that all could move, aim, and shoot opponents. Similarly to the result of Track 1, the gap between the first two bots was tiny. The competition was won by Arnold4, who scored 275 frags and was closely followed up by YanShi, who scored 273.
Arnold4 was the most accurate bot in the whole competition and the only bot that did not commit any suicide. This turned out to be crucial to win against YanShi, who had the same number of 275 kills but committed two suicides. YanShi, however, achieved the highest frags/death ratio by being the best at avoiding being killed and had the highest detection precision. These two were definitely the best compared to the other agents. The next runner-up, IntelAct, the winner of Track 2 in the previous competition, scored substantially fewer, 221 frags. Fewer items on the maps in Track 2 possibly contributed to the lower position of Marvin, which ended up in the fourth place with 193 frags.
|Place||Bot||Avg. sp. (km/h)||Attacks||Shooting Precision (%)||Detection Precision (%)||Hits taken||Dmg. taken (hp)||Ammo||Midkits||Armors|
|Place||Bot||Avg. speed (km/h)||Attacks||Shooting Precision (%)||Detection Precision (%)|
|Place||Bot||Hits taken||Dmg. taken (hp)||Ammo & Weapons||Medikits & Armors|
|Marvin||Tensorflow||A3C, learning from human demonstration|
|Arnold2 & Arnold4||PyTorch||DQN, DRQN|
|YanShi(Track1)||Tensorflow + Tensorpack||DFP + SLAM + MCTS + manually specified rules|
|YanShi(Track2)||Tensorflow + Tensorpack||A3C + SLAM + MCTS + manually specified rules|
|Turmio||Tensorflow||A3C + Haar Detector|
|TBoy||Tensorflow||A3C + random-grouped-curriculum-learning|
Iv-D3 Notable Solutions
Marvin (Ben Bell) won Track 1 and took the fourth place in Track 2. It was a version of the A3C algorithm, pre-trained with replays of human games collected by the authors of the bot and subsequently trained with traditional self-play against the built-in bots. The policy network had separate outputs for actions related to moving and aiming, which also included aiming at the vertical axis. Additionally, a separate policy for aiming was handcrafted to overwrite the network’s decisions if it was unsure of the correct aiming action.
Arnold2 & Arnold4 (Guillaume Lample, Devendra Singh Chaplot) – slightly modified versions of the 2016 runner-up Arnold; they differ from the original mostly by the lack of separate DQN network for navigation, support for strafing and disabled crouching. The latter two changes might explain agent’s progress – proper strafing makes the game considerably easier and the lack of crouching encourages to develop more globally useful behaviors. To address agent’s low mobility, which had been originally alleviated by a separate navigation network, Arnold 2 and 4 were hardcoded to run straight after being stationary for too long (about 1 second). Both Arnold 2 and 4 use the same approach and differ only in track-specific details (e.g., manual weapon selection).
Axon (Cheng Ge; Qiu Lu Zhang; Yu Zheng) – the bot that took the third place in Track 1 was trained using the the A3C algorithm in a few steps: first the policy network was trained on a small dataset generated by human players, then it was trained on various small tasks to learn specific skills like navigation or aiming. Finally, the agent competed against the F1 agent on different maps. The policy network utilized five scales: i) original image, ii) three images covering middle parts of the screen and iii) one image zoomed on the crosshair.
YanShi (Dong Yan, Shiyu Huang, Chongxuan Li, Yichi Zhou) took the second place in Track 2. Their bot explicitly separated the perception and planning problems. It consisted of two modules. The first one combined Region Proposal Network (RPN) , which detects resources and enemies and was trained with additional supervision using labeled images from the ViZDoom engine. The network was combined with a Simultaneous Localization And Mapping (SLAM) algorithm, Monte Carlo Tree Search (MCTS), and a set of hardcoded rules. The output from the first part was combined with the second part that utilized the code of the agents of the year 2016: F1 agent (for Track 1) and IntelAct (for Track 2). Like Marvin, YanShi handles vertical aiming by implementing an aimbot based on the output of the RPN network. Manually specified set of rules contains fast rotation using mouse movement to scan the environment, dodge and prevent getting stuck in the corner of the map.
The average level of the agents was arguably higher in the 2017 competition than in the previous year. This is evidenced by the fact that the agents from the previous competition (F1, IntelAct) were defeated by the new submissions. Surprisingly, the largest improvement is visible in Track 1, where the new champion scored 50% more points than the former one, who ended up on the fifth place. Nevertheless, the bots are still weaker than humans especially on much harder Track 2. One of the authors of the paper can consistently defeat all of them by a considerable margin, although it requires some effort and focus..
In the 2017’s competition, there were several important advancements such as agents capable of effective strafing and vertical aiming. Nonetheless, agents did not exhibit more sophisticated tactics such as aiming at legs (much higher chance of blast damage), which is an obvious and popular technique for human players. A3C was the method of choice for RL while learning from demonstration (Marvin, Axon) has started to become a common practice to bootstrap learning and address the problem of sparse, delayed rewards.
In addition to the algorithmic improvements, the synchronized multi-player support in ViZDoom 1.1 allowed faster training for the 2017’s competition. New features and availability of the winning solutions from the previous competition also opened new possibilities, allowing for a broader range of supervised methods and training agents against other solutions (YanShi), not only the built-in bots.
Disappointingly, bots committed a similar number of suicides in 2017 as in 2016 (Track 1). This is directly connected to the low precision of the performed attacks and inability to understand the surroundings. As a result, the agents often shoot at walls, wounding, and often killing themselves. While human players have varying shooting precision, their detection precision is usually close to 100% (i.e., they do not fire when they do not see the enemy). For most agents, the detection precision decreases on the unknown maps of Track 2 and varies significantly depending on the type of environment. It was observed that, for example, specific textures caused some (apparently, untrained) bots to fire madly at walls.
Due to small map sizes, the agents encountered each other often. It was also noticed that the agents have little (if any) memory — they often ignore and just pass by each other, without trying to chase the enemy, which would be natural for human players.
In the 2017 competition, the solutions were submitted in the form of Docker images, which made the preparation of software environments easier, removed most of the compatibility issues and unified the evaluation procedure. Nevertheless, the need for manual building and troubleshooting some of the submissions remained. This has shown that there is a need for a more automated process, preferably one where solutions can be submitted on a daily basis and are automatically verified and tested giving immediate feedback to the participants.
This paper presented the first two editions of Visual Doom AI Competition, held in 2016 and 2017. The challenge was to create bots that compete in a multi-player deathmatch in a first-person shooter (FPS) game, Doom. The bots were to act based on the raw pixel information only. The contests got a large media coverage555e.g., https://www.theverge.com/2016/4/22/11486164/ai-visual-doom-competition-cig-2016 or https://www.engadget.com/2016/08/18/vizdoom-research-framework-cig-competition/ and attracted teams from leading AI labs around the world.
The winning bots used a variety of state-of-the-art (at that time) RL algorithms (e.g., A3C, DRQN, DFP). The bots submitted in 2017 got stronger by fine-tuning the algorithms and using more supervision (human-replays, curriculum learning). It was also common to separately learn to navigate and to fight. The bots are definitely competent but still worse than human players, who can easily exploit their weaknesses. Thus, a deathmatch from a raw visual input in this FPS game remains an open problem with a lot of research opportunities.
Let us also notice that the deathmatch scenario is a relatively easy problem compared to a task of going through the original single-player Doom levels. It would involve not only appropriate reaction for the current situation but also localization and navigation skills on considerably more complex maps with numerous switches and appropriate keys for different kinds of doors, which need to be found to progress. Therefore, the AI for FPS games using the raw visual input is yet to be solved and we predict that the Visual Doom AI competition will remain a difficult challenge in the nearest future. To further motivate research towards solving this challenging problem, in the upcoming edition of the competition (2018), the form of Track 1 has been changed. The new task is to develop bots that are capable of finishing randomly generated single-player levels, ranging from trivial to sophisticated ones, that contain all the elements of the original game.
The paper also revisited ViZDoom (version 1.1.5), a Doom-based platform for research in vision-based RL that was used for the competitions. The framework is easy-to-use, highly flexible, multi-platform, lightweight, and efficient. In contrast to other popular visual learning environments such as Atari 2600, ViZDoom provides a 3D, semi-realistic, first-person perspective virtual world. ViZDoom’s API gives the user full control of the environment. Multiple modes of operation facilitate experimentation with different learning paradigms such as RL, adversarial training, apprenticeship learning, learning by demonstration, and, even the ‘ordinary’, supervised learning. The strength and versatility of the environment lie in its customizability via a mechanism of scenarios, which can be conveniently programmed with open-source tools. The utility of ViZDoom for research has been proven by a large body of research for which it has been used (e.g., [7, 8, 23, 4, 22]).
This work has been supported in part by Ministry of Science and Education grant no. 91–500/DS. M. Kempka acknowledges the support of Ministry of Science and Higher Education grant no. 09/91/DSPB/0602.
-  T. Atkinson, H. Baier, T. Copplestone, S. Devlin, and J. Swan. The Text-Based Adventure AI Competition. ArXiv e-prints, August 2018.
-  Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton, Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg, and Stig Petersen. Deepmind lab. CoRR, abs/1612.03801, 2016.
-  Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 41–48, New York, NY, USA, 2009. ACM.
-  Shehroze Bhatti, Alban Desmaison, Ondrej Miksik, Nantas Nardelli, N. Siddharth, and Philip H. S. Torr. Playing doom with slam-augmented deep reinforcement learning. CoRR, abs/1612.00380, 2016.
-  Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, and Aaron C. Courville. Home: a household multimodal environment. CoRR, abs/1711.11017, 2017.
Michal Certicky and David Churchill.
The current state of starcraft ai competitions and bots.
AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 2017.
-  Devendra Singh Chaplot, Guillaume Lample, and Ruslan Salakhutdinov. Transfer deep reinforcement learning in 3d environments: An empirical study. In NIPS Deep Reinforcement Learning Workshop. 2016.
-  Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, and Ruslan Salakhutdinov. Gated-attention architectures for task-oriented language grounding. CoRR, abs/1706.07230, 2017.
-  Alexey Dosovitskiy and Vladlen Koltun. Learning to act by predicting the future. In International Conference on Learning Representations (ICLR), 2017.
-  Matthew J. Hausknecht and Peter Stone. Deep recurrent q-learning for partially observable mdps. In International Conference on Learning Representations (ICLR), 2017.
-  Shiyu Huang and Deva Ramanan. Recognition in-the-tail: Training detectors for unusual pedestrians with synthetic imposters. CoRR, abs/1703.06283, 2017.
-  Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z. Leibo, David Silver, and Koray Kavukcuoglu. Reinforcement learning with unsupervised auxiliary tasks. In International Conference on Learning Representations (ICLR), 2017.
-  Matthew Johnson, Katja Hofmann, Tim Hutton, and David Bignell. The malmo platform for artificial intelligence experimentation. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, pages 4246–4247. AAAI Press, 2016.
-  Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Jaśkowski. ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In IEEE Conference on Computational Intelligence and Games, pages 341–348, Santorini, Greece, Sep 2016. IEEE. The best paper award.
-  Lukasz Kidzinski, Sharada Prasanna Mohanty, Carmichael F. Ong, Zhewei Huang, Shuchang Zhou, Anton Pechenko, Adam Stelmaszczyk, Piotr Jarosik, Mikhail Pavlov, Sergey Kolesnikov, Sergey M. Plis, Zhibo Chen, Zhizheng Zhang, Jiale Chen, Jun Shi, Zhuobin Zheng, Chun Yuan, Zhihui Lin, Henryk Michalewski, Piotr Milos, Blazej Osinski, Andrew Melnik, Malte Schilling, Helge Ritter, Sean F. Carroll, Jennifer L. Hicks, Sergey Levine, Marcel Salathé, and Scott L. Delp. Learning to run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments. CoRR, abs/1804.00361, 2018.
-  Eric Kolve, Roozbeh Mottaghi, Daniel Gordon, Yuke Zhu, Abhinav Gupta, and Ali Farhadi. AI2-THOR: An Interactive 3D Environment for Visual AI. arXiv, 2017.
-  Guillaume Lample and Devendra Singh Chaplot. Playing FPS games with deep reinforcement learning. In 31st AAAI Conference on Artificial Intelligence, 2016.
-  Joel Z. Leibo, Cyprien de Masson d’Autume, Daniel Zoran, David Amos, Charles Beattie, Keith Anderson, Antonio García Castañeda, Manuel Sanchez, Simon Green, Audrunas Gruslys, Shane Legg, Demis Hassabis, and Matthew Botvinick. Psychlab: A psychology laboratory for deep reinforcement learning agents. CoRR, abs/1801.08116, 2018.
-  Dirk Merkel. Docker: Lightweight linux containers for consistent development and deployment. Linux J., 2014(239), March 2014.
-  Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning (ICML), 2016.
-  Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015.
-  Emilio Parisotto and Ruslan Salakhutdinov. Neural map: Structured memory for deep reinforcement learning. CoRR, abs/1702.08360, 2017.
-  Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning (ICML), 2017.
-  Diego Perez-Liebana, Spyridon Samothrakis, Julian Togelius, Simon M. Lucas, and Tom Schaul. General video game ai: Competition, challenges, and opportunities. In 30th AAAI Conference on Artificial Intelligence, AAAI 2016, pages 4335–4337. AAAI press, 2016.
-  Alexander Pritzel, Benigno Uria, Sriram Srinivasan, Adrià Puigdomènech Badia, Oriol Vinyals, Demis Hassabis, Daan Wierstra, and Charles Blundell. Neural episodic control. CoRR, abs/1703.01988, 2017.
-  Weichao Qiu and Alan Yuille. Unrealcv: Connecting computer vision to unreal engine. In European Conference on Computer Vision, pages 909–916. Springer, 2016.
-  Weichao Qiu, Fangwei Zhong, Yi Zhang, Siyuan Qiao, Zihao Xiao, Tae Soo Kim, Yizhou Wang, and Alan Yuille. Unrealcv: Virtual worlds for computer vision. ACM Multimedia Open Source Software Competition, 2017.
-  Dino Ratcliffe, Sam Devlin, Udo Kruschwitz, and Luca Citi. Clyde: A deep reinforcement learning doom playing agent. In AAAI Workshops, 2017.
-  David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489, jan 2016.
-  David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy P. Lillicrap, Karen Simonyan, and Demis Hassabis. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. CoRR, abs/1712.01815, 2017.
-  Ivan Sorokin, Alexey Seleznev, Mikhail Pavlov, Aleksandr Fedorov, and Anastasiia Ignateva. Deep attention recurrent q-network. CoRR, abs/1512.01693, 2015.
-  Julian Togelius. How to run a successful game-based ai competition. IEEE Transactions on Computational Intelligence and AI in Games, 8(1):95–100, 2016.
Paul Viola and Michael Jones.
Rapid object detection using a boosted cascade of simple features.
Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 511–518, 2001.
-  Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pages 1995–2003. JMLR.org, 2016.
-  P. R. Williams, D. Perez-Liebana, and S. M. Lucas. Ms. pac-man versus ghost team cig 2016 competition. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1–8, Sept 2016.
-  Y. Wu, Y. Wu, G. Gkioxari, and Y. Tian. Building Generalizable Agents with a Realistic and Rich 3D Environment. ArXiv e-prints, January 2018.
-  Yuxin Wu and Yuandong Tian. Training agent for first-person shooter game with actor-critic curriculum learning. In International Conference on Learning Representations (ICLR), 2017.
-  C. Yan, D. Misra, A. Bennnett, A. Walsman, Y. Bisk, and Y. Artzi. CHALET: Cornell House Agent Learning Environment. ArXiv e-prints, January 2018.