Local Search for Policy Iteration in Continuous Control

10/12/2020
by   Jost Tobias Springenberg, et al.
0

We present an algorithm for local, regularized, policy improvement in reinforcement learning (RL) that allows us to formulate model-based and model-free variants in a single framework. Our algorithm can be interpreted as a natural extension of work on KL-regularized RL and introduces a form of tree search for continuous action spaces. We demonstrate that additional computation spent on model-based policy improvement during learning can improve data efficiency, and confirm that model-based policy improvement during action selection can also be beneficial. Quantitatively, our algorithm improves data efficiency on several continuous control benchmarks (when a model is learned in parallel), and it provides significant improvements in wall-clock time in high-dimensional domains (when a ground truth model is available). The unified framework also helps us to better understand the space of model-based and model-free algorithms. In particular, we demonstrate that some benefits attributed to model-based RL can be obtained without a model, simply by utilizing more computation.

READ FULL TEXT
research
06/09/2020

Variational Model-based Policy Optimization

Model-based reinforcement learning (RL) algorithms allow us to combine m...
research
09/15/2019

Policy Prediction Network: Model-Free Behavior Policy with Model-Based Learning in Continuous Action Space

This paper proposes a novel deep reinforcement learning architecture tha...
research
01/15/2021

Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

Using a high Update-To-Data (UTD) ratio, model-based methods have recent...
research
04/13/2021

Muesli: Combining Improvements in Policy Optimization

We propose a novel policy update that combines regularized policy optimi...
research
05/03/2019

Information asymmetry in KL-regularized RL

Many real world tasks exhibit rich structure that is repeated across dif...
research
06/13/2020

Reinforcement Learning as Iterative and Amortised Inference

There are several ways to categorise reinforcement learning (RL) algorit...
research
05/28/2018

Dual Policy Iteration

Recently, a novel class of Approximate Policy Iteration (API) algorithms...

Please sign up or login with your details

Forgot password? Click here to reset