Thinking Fast and Slow with Deep Learning and Tree Search

05/23/2017
by   Thomas Anthony, et al.
0

Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration (ExIt), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. Subsequently, tree search is improved by using the neural network policy to guide search, increasing the strength of new plans. In contrast, standard deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that ExIt outperforms REINFORCE for training a neural network to play the board game Hex, and our final tree search agent, trained tabula rasa, defeats MoHex 1.0, the most recent Olympiad Champion player to be publicly released.

READ FULL TEXT
research
05/06/2019

Combining Planning and Deep Reinforcement Learning in Tactical Decision Making for Autonomous Driving

Tactical decision making for autonomous driving is challenging due to th...
research
04/22/2020

Flexible and Efficient Long-Range Planning Through Curious Exploration

Identifying algorithms that flexibly and efficiently discover temporally...
research
01/20/2011

Dyna-H: a heuristic planning reinforcement learning algorithm applied to role-playing-game strategy decision systems

In a Role-Playing Game, finding optimal trajectories is one of the most ...
research
01/15/2014

The Role of Macros in Tractable Planning

This paper presents several new tractability results for planning based ...
research
02/15/2018

MPC-Inspired Neural Network Policies for Sequential Decision Making

In this paper we investigate the use of MPC-inspired neural network poli...
research
08/25/2023

Diverse, Top-k, and Top-Quality Planning Over Simulators

Diverse, top-k, and top-quality planning are concerned with the generati...
research
09/10/2020

Using Graph Convolutional Networks and TD(λ) to play the game of Risk

Risk is 6 player game with significant randomness and a large game-tree ...

Please sign up or login with your details

Forgot password? Click here to reset