Improving width-based planning with compact policies

06/15/2018
by   Miquel Junyent, et al.
0

Optimal action selection in decision problems characterized by sparse, delayed rewards is still an open challenge. For these problems, current deep reinforcement learning methods require enormous amounts of data to learn controllers that reach human-level performance. In this work, we propose a method that interleaves planning and learning to address this issue. The planning step hinges on the Iterated-Width (IW) planner, a state of the art planner that makes explicit use of the state representation to perform structured exploration. IW is able to scale up to problems independently of the size of the state space. From the state-actions visited by IW, the learning step estimates a compact policy, which in turn is used to guide the planning step. The type of exploration used by our method is radically different than the standard random exploration used in RL. We evaluate our method in simple problems where we show it to have superior performance than the state-of-the-art reinforcement learning algorithms A2C and Alpha Zero. Finally, we present preliminary results in a subset of the Atari games suite.

READ FULL TEXT
research
04/12/2019

Deep Policies for Width-Based Planning in Pixel Domains

Width-based planning has demonstrated great success in recent years due ...
research
10/24/2020

Improving the Exploration of Deep Reinforcement Learning in Continuous Domains using Planning for Policy Search

Local policy search is performed by most Deep Reinforcement Learning (D-...
research
01/15/2021

Hierarchical Width-Based Planning and Learning

Width-based search methods have demonstrated state-of-the-art performanc...
research
10/02/2018

The Dreaming Variational Autoencoder for Reinforcement Learning Environments

Reinforcement learning has shown great potential in generalizing over ra...
research
02/12/2018

Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation

Modern reinforcement learning algorithms reach super-human performance i...
research
06/23/2021

Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark

We propose new width-based planning and learning algorithms applied over...
research
10/31/2017

TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning

Combining deep model-free reinforcement learning with on-line planning i...

Please sign up or login with your details

Forgot password? Click here to reset