Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

11/19/2019
by   Julian Schrittwieser, et al.
23

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.

READ FULL TEXT
research
11/13/2020

Critic PI2: Master Continuous Planning via Policy Improvement with Path Integrals and Deep Actor-Critic Reinforcement Learning

Constructing agents with planning capabilities has long been one of the ...
research
10/10/2011

Learning Symbolic Models of Stochastic Domains

In this article, we work towards the goal of developing agents that can ...
research
12/28/2016

The Predictron: End-To-End Learning and Planning

One of the key challenges of artificial intelligence is to learn models ...
research
10/21/2020

Influence-Augmented Online Planning for Complex Environments

How can we plan efficiently in real time to control an agent in a comple...
research
10/12/2021

Planning from Pixels in Environments with Combinatorially Hard Search Spaces

The ability to form complex plans based on raw visual input is a litmus ...
research
09/01/2021

Planning from video game descriptions

This project proposes a methodology for the automatic generation of acti...
research
12/22/2020

Goal Reasoning by Selecting Subgoals with Deep Q-Learning

In this work we propose a goal reasoning method which learns to select s...

Please sign up or login with your details

Forgot password? Click here to reset