Reinforcement Learning with Perturbed Rewards

10/02/2018
by   Yang Liu, et al.
8

Recent studies have shown the vulnerability of reinforcement learning (RL) models in noisy settings. The sources of noises differ across scenarios. For instance, in practice, the observed reward channel is often subject to noise (e.g., when observed rewards are collected through sensors), and thus observed rewards may not be credible as a result. Also, in applications such as robotics, a deep reinforcement learning (DRL) algorithm can be manipulated to produce arbitrary errors. In this paper, we consider noisy RL problems where observed rewards by RL agents are generated with a reward confusion matrix. We call such observed rewards as perturbed rewards. We develop an unbiased reward estimator aided robust RL framework that enables RL agents to learn in noisy environments while observing only perturbed rewards. Our framework draws upon approaches for supervised learning with noisy data. The core ideas of our solution include estimating a reward confusion matrix and defining a set of unbiased surrogate rewards. We prove the convergence and sample complexity of our approach. Extensive experiments on different DRL platforms show that policies based on our estimated surrogate reward can achieve higher expected rewards, and converge faster than existing baselines. For instance, the state-of-the-art PPO algorithm is able to obtain 67.5 in average on five Atari games, when the error rates are 10 respectively.

READ FULL TEXT

page 21

page 22

page 23

page 24

research
04/21/2020

SIBRE: Self Improvement Based REwards for Reinforcement Learning

We propose a generic reward shaping approach for improving rate of conve...
research
05/02/2021

InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem

The temporal Credit Assignment Problem (CAP) is a well-known and challen...
research
01/18/2021

Stable deep reinforcement learning method by predicting uncertainty in rewards as a subtask

In recent years, a variety of tasks have been accomplished by deep reinf...
research
06/16/2021

Unbiased Methods for Multi-Goal Reinforcement Learning

In multi-goal reinforcement learning (RL) settings, the reward for each ...
research
12/12/2020

Semi-supervised reward learning for offline reinforcement learning

In offline reinforcement learning (RL) agents are trained using a logged...
research
09/19/2023

PDRL: Multi-Agent based Reinforcement Learning for Predictive Monitoring

Reinforcement learning has been increasingly applied in monitoring appli...
research
12/05/2019

Reinforcement Learning Upside Down: Don't Predict Rewards – Just Map Them to Actions

We transform reinforcement learning (RL) into a form of supervised learn...

Please sign up or login with your details

Forgot password? Click here to reset