On-Policy Model Errors in Reinforcement Learning

10/15/2021
by   Lukas P. Fröhlich, et al.
0

Model-free reinforcement learning algorithms can compute policy gradients given sampled environment transitions, but require large amounts of data. In contrast, model-based methods can use the learned model to generate new data, but model errors and bias can render learning unstable or sub-optimal. In this paper, we present a novel method that combines real world data and a learned model in order to get the best of both worlds. The core idea is to exploit the real world data for on-policy predictions and use the learned model only to generalize to different actions. Specifically, we use the data as time-dependent on-policy correction terms on top of a learned model, to retain the ability to generate data without accumulating errors over long prediction horizons. We motivate this method theoretically and show that it counteracts an error term for model-based policy improvement. Experiments on MuJoCo- and PyBullet-benchmarks show that our method can drastically improve existing model-based approaches without introducing additional tuning parameters.

READ FULL TEXT
research
11/28/2019

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

Model-based reinforcement learning algorithms tend to achieve higher sam...
research
06/19/2019

When to Trust Your Model: Model-Based Policy Optimization

Designing effective model-based reinforcement learning algorithms is dif...
research
06/08/2020

Maximum Entropy Model Rollouts: Fast Model Based Policy Optimization without Compounding Errors

Model usage is the central challenge of model-based reinforcement learni...
research
07/12/2018

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

Deep reinforcement learning has recently shown many impressive successes...
research
06/06/2022

Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL

Model-based reinforcement learning promises to learn an optimal policy f...
research
03/17/2022

Investigating Compounding Prediction Errors in Learned Dynamics Models

Accurately predicting the consequences of agents' actions is a key prere...
research
01/15/2022

Physical Derivatives: Computing policy gradients by physical forward-propagation

Model-free and model-based reinforcement learning are two ends of a spec...

Please sign up or login with your details

Forgot password? Click here to reset