Learning the Reward Function for a Misspecified Model

01/29/2018
by   Erik Talvitie, et al.
0

In model-based reinforcement learning it is typical to treat the problems of learning the dynamics model and learning the reward function separately. However, when the dynamics model is flawed, it may generate erroneous states that would never occur in the true environment. A reward function trained only to map environment states to rewards (as is typical) would have little guidance in such states. This paper presents a novel error bound that accounts for the reward model's behavior in states sampled from the model. This bound is used to extend the existing Hallucinated DAgger-MC algorithm, which offers theoretical performance guarantees in deterministic MDPs that do not assume a perfect model can be learned. Empirically, this approach to reward learning can yield dramatic improvements in control performance when the dynamics model is flawed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2021

Markov Rewards Processes with Impulse Rewards and Absorbing States

We study the expected accumulated reward for a discrete-time Markov rewa...
research
04/06/2023

Robust Decision-Focused Learning for Reward Transfer

Decision-focused (DF) model-based reinforcement learning has recently be...
research
12/05/2019

Learning Human Objectives by Evaluating Hypothetical Behavior

We seek to align agent behavior with a user's objectives in a reinforcem...
research
06/28/2020

Image Classification by Reinforcement Learning with Two-State Q-Learning

In this paper, a simple and efficient Hybrid Classifier is presented whi...
research
07/24/2023

Contrastive Example-Based Control

While many real-world problems that might benefit from reinforcement lea...
research
12/07/2022

Specifying Behavior Preference with Tiered Reward Functions

Reinforcement-learning agents seek to maximize a reward signal through e...
research
12/19/2016

Self-Correcting Models for Model-Based Reinforcement Learning

When an agent cannot represent a perfectly accurate model of its environ...

Please sign up or login with your details

Forgot password? Click here to reset