Iterative Reward Shaping using Human Feedback for Correcting Reward Misspecification

08/30/2023
by   Jasmina Gajcin, et al.
0

A well-defined reward function is crucial for successful training of an reinforcement learning (RL) agent. However, defining a suitable reward function is a notoriously challenging task, especially in complex, multi-objective environments. Developers often have to resort to starting with an initial, potentially misspecified reward function, and iteratively adjusting its parameters, based on observed learned behavior. In this work, we aim to automate this process by proposing ITERS, an iterative reward shaping approach using human feedback for mitigating the effects of a misspecified reward function. Our approach allows the user to provide trajectory-level feedback on agent's behavior during training, which can be integrated as a reward shaping signal in the following training iteration. We also allow the user to provide explanations of their feedback, which are used to augment the feedback and reduce user effort and feedback frequency. We evaluate ITERS in three environments and show that it can successfully correct misspecified reward functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2023

Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning

An appropriate reward function is of paramount importance in specifying ...
research
07/22/2023

DIP-RL: Demonstration-Inferred Preference Learning in Minecraft

In machine learning for sequential decision-making, an algorithmic agent...
research
11/17/2020

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

How can we design agents that pursue a given objective when all feedback...
research
02/12/2020

Reward-rational (implicit) choice: A unifying formalism for reward learning

It is often difficult to hand-specify what the correct reward function i...
research
11/14/2022

Interactively Learning to Summarise Timelines by Reinforcement Learning

Timeline summarisation (TLS) aims to create a time-ordered summary list ...
research
06/11/2020

Avoiding Side Effects in Complex Environments

Reward function specification can be difficult, even in simple environme...
research
11/19/2018

Scalable agent alignment via reward modeling: a research direction

One obstacle to applying reinforcement learning algorithms to real-world...

Please sign up or login with your details

Forgot password? Click here to reset