Deep Policies for Width-Based Planning in Pixel Domains

04/12/2019
by   Miquel Junyent, et al.
0

Width-based planning has demonstrated great success in recent years due to its ability to scale independently of the size of the state space. For example, Bandres et al. (2018) introduced a rollout version of the Iterated Width algorithm whose performance compares well with humans and learning methods in the pixel setting of the Atari games suite. In this setting, planning is done on-line using the "screen" states and selecting actions by looking ahead into the future. However, this algorithm is purely exploratory and does not leverage past reward information. Furthermore, it requires the state to be factored into features that need to be pre-defined for the particular task, e.g., the B-PROST pixel features. In this work, we extend width-based planning by incorporating an explicit policy in the action selection mechanism. Our method, called π-IW, interleaves width-based planning and policy learning using the state-actions visited by the planner. The policy estimate takes the form of a neural network and is in turn used to guide the planning step, thus reinforcing promising paths. Surprisingly, we observe that the representation learned by the neural network can be used as a feature space for the width-based planner without degrading its performance, thus removing the requirement of pre-defined features for the planner. We compare π-IW with previous width-based methods and with AlphaZero, a method that also interleaves planning and learning, in simple environments, and show that π-IW has superior performance. We also show that π-IW algorithm outperforms previous width-based methods in the pixel setting of Atari games suite.

READ FULL TEXT

page 5

page 6

research
06/15/2018

Improving width-based planning with compact policies

Optimal action selection in decision problems characterized by sparse, d...
research
01/15/2021

Hierarchical Width-Based Planning and Learning

Width-based search methods have demonstrated state-of-the-art performanc...
research
01/10/2018

Planning with Pixels in (Almost) Real Time

Recently, width-based planning methods have been shown to yield state-of...
research
12/16/2020

Planning From Pixels in Atari With Learned Symbolic Representations

Width-based planning methods have been shown to yield state-of-the-art p...
research
09/30/2021

Width-Based Planning and Active Learning for Atari

Width-based planning has shown promising results on Atari 2600 games usi...
research
05/10/2021

Expressing and Exploiting the Common Subgoal Structure of Classical Planning Domains Using Sketches: Extended Version

Width-based planning methods exploit the use of conjunctive goals for de...
research
06/23/2021

Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark

We propose new width-based planning and learning algorithms applied over...

Please sign up or login with your details

Forgot password? Click here to reset