TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning

10/31/2017
by   Gregory Farquhar, et al.
0

Combining deep model-free reinforcement learning with on-line planning is a promising approach to building on the successes of deep RL. On-line planning with look-ahead trees has proven successful in environments where transition models are known a priori. However, in complex environments where transition models need to be learned from data, the deficiencies of learned models have limited their utility for planning. To address these challenges, we propose TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions. TreeQN dynamically constructs a tree by recursively applying a transition model in a learned abstract state space and then aggregating predicted rewards and state-values using a tree backup to estimate Q-values. We also propose ATreeC, an actor-critic variant that augments TreeQN with a softmax layer to form a stochastic policy network. Both approaches are trained end-to-end, such that the learned model is optimised for its actual use in the planner. We show that TreeQN and ATreeC outperform n-step DQN and A2C on a box-pushing task, as well as n-step DQN and value prediction networks (Oh et al., 2017) on multiple Atari games, with deeper trees often outperforming shallower ones. We also present a qualitative analysis that sheds light on the trees learned by TreeQN.

READ FULL TEXT

page 4

page 8

research
07/11/2017

Value Prediction Network

This paper proposes a novel deep reinforcement learning (RL) architectur...
research
01/11/2019

An investigation of model-free planning

The field of reinforcement learning (RL) is facing increasingly challeng...
research
04/05/2020

Reinforcement Learning Architectures: SAC, TAC, and ESAC

The trend is to implement intelligent agents capable of analyzing availa...
research
12/28/2016

The Predictron: End-To-End Learning and Planning

One of the key challenges of artificial intelligence is to learn models ...
research
06/15/2018

Improving width-based planning with compact policies

Optimal action selection in decision problems characterized by sparse, d...
research
02/08/2022

GrASP: Gradient-Based Affordance Selection for Planning

Planning with a learned model is arguably a key component of intelligenc...
research
02/07/2019

Deeper & Sparser Exploration

We address the problem of efficient exploration by proposing a new meta ...

Please sign up or login with your details

Forgot password? Click here to reset