An investigation of model-free planning

01/11/2019
by   Arthur Guez, et al.
10

The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.

READ FULL TEXT

page 4

page 10

research
08/24/2022

A model-based approach to meta-Reinforcement Learning: Transformers and tree search

Meta-learning is a line of research that develops the ability to leverag...
research
07/26/2019

On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

How to best explore in domains with sparse, delayed, and deceptive rewar...
research
10/31/2017

TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning

Combining deep model-free reinforcement learning with on-line planning i...
research
12/02/2021

Residual Pathway Priors for Soft Equivariance Constraints

There is often a trade-off between building deep learning systems that a...
research
03/26/2021

Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow

In the past decade, model-free reinforcement learning (RL) has provided ...
research
12/03/2019

Adaptive Online Planning for Continual Lifelong Learning

We study learning control in an online lifelong learning scenario, where...
research
02/19/2022

Who Are the Best Adopters? User Selection Model for Free Trial Item Promotion

With the increasingly fierce market competition, offering a free trial h...

Please sign up or login with your details

Forgot password? Click here to reset