Learning Off-Policy with Online Planning

08/23/2020
by   Harshit Sikchi, et al.
0

We propose Learning Off-Policy with Online Planning (LOOP), combining the techniques from model-based and model-free reinforcement learning algorithms. The agent learns a model of the environment, and then uses trajectory optimization with the learned model to select actions. To sidestep the myopic effect of fixed horizon trajectory optimization, a value function is attached to the end of the planning horizon. This value function is learned through off-policy reinforcement learning, using trajectory optimization as its behavior policy. Furthermore, we introduce "actor-guided" trajectory optimization to mitigate the actor-divergence issue in the proposed method. We benchmark our methods on continuous control tasks and demonstrate that it offers a significant improvement over the underlying model-based and model-free algorithms.

READ FULL TEXT

page 4

page 6

page 7

research
05/16/2020

Model-Augmented Actor-Critic: Backpropagating through Paths

Current model-based reinforcement learning approaches use the model simp...
research
11/05/2018

Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control

We propose a plan online and learn offline (POLO) framework for the sett...
research
03/28/2019

Regularizing Trajectory Optimization with Denoising Autoencoders

Trajectory optimization with learned dynamics models can often suffer fr...
research
01/22/2020

Local Policy Optimization for Trajectory-Centric Reinforcement Learning

The goal of this paper is to present a method for simultaneous trajector...
research
12/03/2019

Adaptive Online Planning for Continual Lifelong Learning

We study learning control in an online lifelong learning scenario, where...
research
06/25/2021

Predictive Control Using Learned State Space Models via Rolling Horizon Evolution

A large part of the interest in model-based reinforcement learning deriv...
research
05/05/2021

Model-free policy evaluation in Reinforcement Learning via upper solutions

In this work we present an approach for building tight model-free confid...

Please sign up or login with your details

Forgot password? Click here to reset