Local Search for Policy Iteration in Continuous Control

10/12/2020
by   Jost Tobias Springenberg, et al.
0

We present an algorithm for local, regularized, policy improvement in reinforcement learning (RL) that allows us to formulate model-based and model-free variants in a single framework. Our algorithm can be interpreted as a natural extension of work on KL-regularized RL and introduces a form of tree search for continuous action spaces. We demonstrate that additional computation spent on model-based policy improvement during learning can improve data efficiency, and confirm that model-based policy improvement during action selection can also be beneficial. Quantitatively, our algorithm improves data efficiency on several continuous control benchmarks (when a model is learned in parallel), and it provides significant improvements in wall-clock time in high-dimensional domains (when a ground truth model is available). The unified framework also helps us to better understand the space of model-based and model-free algorithms. In particular, we demonstrate that some benefits attributed to model-based RL can be obtained without a model, simply by utilizing more computation.

READ FULL TEXT
06/09/2020

Variational Model-based Policy Optimization

Model-based reinforcement learning (RL) algorithms allow us to combine m...
09/15/2019

Policy Prediction Network: Model-Free Behavior Policy with Model-Based Learning in Continuous Action Space

This paper proposes a novel deep reinforcement learning architecture tha...
01/15/2021

Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

Using a high Update-To-Data (UTD) ratio, model-based methods have recent...
04/13/2021

Muesli: Combining Improvements in Policy Optimization

We propose a novel policy update that combines regularized policy optimi...
05/03/2019

Information asymmetry in KL-regularized RL

Many real world tasks exhibit rich structure that is repeated across dif...
06/13/2020

Reinforcement Learning as Iterative and Amortised Inference

There are several ways to categorise reinforcement learning (RL) algorit...
04/14/2022

Accelerated Policy Learning with Parallel Differentiable Simulation

Deep reinforcement learning can generate complex control policies, but r...