Self-Correcting Models for Model-Based Reinforcement Learning

12/19/2016
by   Erik Talvitie, et al.
0

When an agent cannot represent a perfectly accurate model of its environment's dynamics, model-based reinforcement learning (MBRL) can fail catastrophically. Planning involves composing the predictions of the model; when flawed predictions are composed, even minor errors can compound and render the model useless for planning. Hallucinated Replay (Talvitie 2014) trains the model to "correct" itself when it produces errors, substantially improving MBRL with flawed models. This paper theoretically analyzes this approach, illuminates settings in which it is likely to be effective or ineffective, and presents a novel error bound, showing that a model's ability to self-correct is more tightly related to MBRL performance than one-step prediction error. These results inspire an MBRL algorithm for deterministic MDPs with performance guarantees that are robust to model class limitations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2019

Combating the Compounding-Error Problem with a Multi-step Model

Model-based reinforcement learning is an appealing framework for creatin...
research
10/24/2020

Planning with Exploration: Addressing Dynamics Bottleneck in Model-based Reinforcement Learning

Model-based reinforcement learning is a framework in which an agent lear...
research
01/29/2018

Learning the Reward Function for a Misspecified Model

In model-based reinforcement learning it is typical to treat the problem...
research
02/20/2023

Understanding the effect of varying amounts of replay per step

Model-based reinforcement learning uses models to plan, where the predic...
research
10/03/2020

Episodic Memory for Learning Subjective-Timescale Models

In model-based learning, an agent's model is commonly defined over trans...
research
06/08/2020

Maximum Entropy Model Rollouts: Fast Model Based Policy Optimization without Compounding Errors

Model usage is the central challenge of model-based reinforcement learni...
research
03/17/2022

Investigating Compounding Prediction Errors in Learned Dynamics Models

Accurately predicting the consequences of agents' actions is a key prere...

Please sign up or login with your details

Forgot password? Click here to reset