Reinforcement learning in 3D.
DeepMind Lab is a first-person 3D game platform designed for research and development of general artificial intelligence and machine learning systems. DeepMind Lab can be used to study how autonomous artificial agents may learn complex tasks in large, partially observed, and visually diverse worlds. DeepMind Lab has a simple and flexible API enabling creative task-designs and novel AI-designs to be explored and quickly iterated upon. It is powered by a fast and widely recognised game engine, and tailored for effective use by the research community.READ FULL TEXT VIEW PDF
Real Time Strategy (RTS) games provide complex domain to test the latest...
Dungeon Crawl Stone Soup is a popular, single-player, free and open-sour...
We describe an effort to annotate a corpus of natural language instructi...
The development of biometric recognition solutions able to work in visua...
Computer Poker's unique characteristics present a well-suited challenge ...
Virtual labs allow researchers to design high-throughput and macro-level...
Psychlab is a simulated psychology laboratory inside the first-person 3D...
Reinforcement learning in 3D.
General intelligence measures an agent’s ability to achieve goals in a wide range of environments (legg2007universal). The only known examples of general-purpose intelligence arose from a combination of evolution, development, and learning, grounded in the physics of the real world and the sensory apparatus of animals. An unknown, but potentially large, fraction of animal and human intelligence is a direct consequence of the perceptual and physical richness of our environment, and is unlikely to arise without it (e.g. locke1690essay; hume1739treatise). One option is to directly study embodied intelligence in the real world itself using robots (e.g. brooks1990elephants; metta2008icub). However, progress on that front will always be hindered by the too-slow passing of real time and the expense of the physical hardware involved. Realistic virtual worlds on the other hand, if they are sufficiently detailed, can get the best of both, combining perceptual and physical near-realism with the speed and flexibility of software.
Previous efforts to construct realistic virtual worlds as platforms for AI research have been stymied by the considerable engineering involved. To fill the gap, we present DeepMind Lab. DeepMind Lab is a first-person 3D game platform built on top of id software’s Quake III Arena (QuakeThree)
engine. The world is rendered with rich science fiction-style visuals. Actions are to look around and move in 3D. Example tasks include navigation in mazes, collecting fruit, traversing dangerous passages and avoiding falling off cliffs, bouncing through space using launch pads to move between platforms, laser tag, quickly learning and remembering random procedurally generated environments, and tasks inspired by Neuroscience experiments. DeepMind Lab is already a major research platform within DeepMind. In particular, it has been used to develop asynchronous methods for reinforcement learning(mnih2016asynchronous), unsupervised auxiliary tasks (jaderberg2016reinforcement), and to study navigation (mirowski2016learning).
DeepMind Lab may be compared to other game-based AI research platforms emphasising pixels-to-actions autonomous learning agents. The Arcade Learning Environment (Atari) (bellemare2012arcade), which we have used extensively at DeepMind, is neither 3D nor first-person. Among 3D platforms for AI research, DeepMind Lab is comparable to others like VizDoom (kempka2016vizdoom) and Minecraft (johnson2016malmo; tessler2016deep). However, it pushes the envelope beyond what is possible in those platforms. In comparison, DeepMind Lab has considerably richer visuals and more naturalistic physics. The action space allows for fine-grained pointing in a fully 3D world. Compared to VizDoom, DeepMind Lab is more removed from its origin in a first-person shooter genre video game. This work is different and complementary to other recent projects which run as plugins to access internal content in the Unreal engine (qiu2016unrealcv; lerer2016learning)
. Any of these systems can be used to generate static datasets for computer vision as described e.g., inmahendran2016researchdoom; richter2016playing.
Artificial general intelligence (AGI) research in DeepMind Lab emphasises 3D vision from raw pixel inputs, first-person (egocentric) viewpoints, fine motor dexterity, navigation, planning, strategy, time, and fully autonomous agents that must learn for themselves what tasks to perform by exploration of their environment. All these factors make learning difficult. Each are considered frontier research questions on their own. Putting them all together in one platform, as we have, is a significant challenge for the field.
DeepMind Lab is built on top of id software’s Quake III Arena (QuakeThree) engine using the ioquake3 (nussel2016ioquake3) version of the codebase, which is actively maintained by enthusiasts in the open source community. DeepMind Lab also includes tools from q3map2 (gtkradiant) and bspc (bspc) for level generation. The bot scripts are based on code from the OpenArena (OpenArena) project.
A custom set of assets were created to give the platform a unique and stylised look and feel, with a focus on rich visuals tailored for machine learning.
A reinforcement learning API has been built on top of the game engine, providing agents with complex observations and accepting a rich set of actions.
The interaction with the platform is lock-stepped, with the engine stepped forward one simulation step (or multiple with repeated actions, if desired) at a time, according to a user-specified frame rate. Thus, the game is effectively paused after an observation is provided until an agent provides the next action(s) to take.
At each step, the engine provides reward, pixel-based observations and, optionally, velocity information (figure 1):
The reward signal is a scalar value that is effectively the score of each level.
The platform provides access to the raw pixels as rendered by the game engine from the player’s first-person perspective, formatted as RGB pixels. There is also an RGBD format, which additionally exposes per-pixel depth values, mimicking the range sensors used in robotics and biological stereo-vision.
For certain research applications the agent’s translational and angular velocities may be useful. These are exposed as two separate three-dimensional vectors.
Agents can provide multiple simultaneous actions to control movement (forward/back, strafe left/right, crouch, jump), looking (up/down, left/right) and tagging (in laser tag levels with opponent bots), as illustrated in figure 2.
Simple fruit gathering levels with a static map ( and
). The goal of these levels is to collect apples (small positive reward) and melons (large positive reward) while avoiding lemons (small negative reward).
Navigation levels with a static map layout ( and
). These levels test the agent’s ability to find their way to a goal in a fixed maze that remains the same across episodes. The starting location is random. In the random goal variant, the location of the goal changes in every episode. The optimal policy is to find the goal’s location at the start of each episode and then use long-term knowledge of the maze layout to return to it as quickly as possible from any location. The static variant is simpler in that the goal location is always fixed for all episodes and only the agent’s starting location changes so the optimal policy does not require the first step of exploring to find the current goal location. The specific layouts are shown in figure 3.
Procedurally-generated navigation levels requiring effective exploration of a new maze generated on-the-fly at the start of each episode (). These levels test the agent’s ability to explore a totally new environment. The optimal policy would begin by exploring the maze to rapidly learn its layout and then exploit that knowledge to repeatedly return to the goal as many times as possible before the end of the episode (three minutes).
Laser-tag levels requiring agents to wield laser-like science fiction gadgets to tag bots controlled by the game’s in-built AI (, ,
, and ). A reward of is delivered whenever the agent tags a bot by reducing its shield to 0. These levels approximate the usual gameplay from Quake III Arena. In there is a sloped arena, requiring the agent to look up and down. In and there are pits that the agent must jump over and avoid falling into. In and , the colours and textures of the bots are randomly generated at the start of each episode. This prevents agents from relying on colour for bot detection. These levels test aspects of fine-control (for aiming), planning (to anticipate where bots are likely to move), strategy (to control key areas of the map such as gadget spawn points), and robustness to the substantial visual complexity arising from the large numbers of independently moving objects (gadget projectiles and bots).
The original game engine is written in C and, to ensure compatibility with future changes to the engine, it has only been modified where necessary. DeepMind Lab provides a simple C API and ships with Python bindings.
The platform includes an extensive level API, written in Lua, to allow custom level creation and mechanics. This approach has resulted in a highly flexible platform with minimal changes to the original game engine.
DeepMind Lab supports Linux and has been tested on several major distributions.
The engine can be run either in a window, or it can be run headless for higher performance and support for non-windowed environments like a remote terminal. Rendering uses OpenGL and can make use of either a GPU or a software renderer.
A DeepMind Lab instance is initialised with the user’s settings for level name, screen resolution and frame rate. After initialisation a simple RL-style API is followed to interact with the environment, as per figure LABEL:fig:python_example.
In the Lua-based level API each level can be customised further with logic for bots, item pickups, custom observations, level restarts, reward schemes, in-game messages and many other aspects.
Tables 1 and 2 show the platform’s performance at different resolutions for two typical levels included with the platform. The frame rates listed were computed by connecting an agent performing random actions via the Python API. This agent has insignificant overhead so the results are dominated by engine simulation and rendering times.
The benchmarks were run on a Linux desktop with a 6-core Intel Xeon 3.50GHz CPU and an NVIDIA Quadro K600 GPU.
|84 x 84||199.7||189.6||996.6||995.8|
|160 x 120||86.8||85.4||973.2||989.2|
|320 x 240||27.3||27.0||950.0||784.7|
|84 x 84||286.7||263.3||866.0||850.3|
|160 x 120||237.7||263.6||903.7||767.9|
|320 x 240||82.2||98.0||796.2||657.8|
Machine learning results from early versions of the DeepMind Lab platform can be found in mnih2016asynchronous; jaderberg2016reinforcement; mirowski2016learning.
DeepMind Lab enables research in a 3D world with rich science fiction visuals and game-like physics. DeepMind Lab facilitates creative task development. A wide range of environments, tasks, and intelligence tests can be built with it. We are excited to see what the research community comes up with.
This work would not have been possible without the support of DeepMind and our many colleagues there who have helped mature the platform. In particular we would like to thank Thomas Köppe, Hado van Hasselt, Volodymyr Mnih, Dharshan Kumaran, Timothy Lillicrap, Raia Hadsell, Andrea Banino, Piotr Mirowski, Antonio Garcia, Timo Ewalds, Colin Murdoch, Chris Apps, Andreas Fidjeland, Max Jaderberg, Wojtek Czarnecki, Georg Ostrovski, Audrunas Gruslys, David Reichert, Tim Harley and Hubert Soyer.