Reinforcement Learning with a Corrupted Reward Channel

05/23/2017
by   Tom Everitt, et al.
0

No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions.

READ FULL TEXT
research
06/17/2022

Logic-based Reward Shaping for Multi-Agent Reinforcement Learning

Reinforcement learning (RL) relies heavily on exploration to learn from ...
research
06/01/2023

Active Reinforcement Learning under Limited Visual Observability

In this work, we investigate Active Reinforcement Learning (Active-RL), ...
research
05/25/2021

A Comparison of Reward Functions in Q-Learning Applied to a Cart Position Problem

Growing advancements in reinforcement learning has led to advancements i...
research
04/25/2023

Loss and Reward Weighing for increased learning in Distributed Reinforcement Learning

This paper introduces two learning schemes for distributed agents in Rei...
research
06/08/2021

RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation

To date, most abstractive summarisation models have relied on variants o...
research
09/27/2021

From internal models toward metacognitive AI

In several papers published in Biological Cybernetics in the 1980s and 1...
research
12/12/2020

Semi-supervised reward learning for offline reinforcement learning

In offline reinforcement learning (RL) agents are trained using a logged...

Please sign up or login with your details

Forgot password? Click here to reset