Shaping Proto-Value Functions via Rewards

In this paper, we combine task-dependent reward shaping and task-independent proto-value functions to obtain reward dependent proto-value functions (RPVFs). In constructing the RPVFs we are making use of the immediate rewards which are available during the sampling phase but are not used in the PVF construction. We show via experiments that learning with an RPVF based representation is better than learning with just reward shaping or PVFs. In particular, when the state space is symmetrical and the rewards are asymmetrical, the RPVF capture the asymmetry better than the PVFs.

READ FULL TEXT

page 10

page 11

research
12/11/2019

What Can Learned Intrinsic Rewards Capture?

Reinforcement learning agents can include different components, such as ...
research
05/28/2023

Reward Collapse in Aligning Large Language Models

The extraordinary capabilities of large language models (LLMs) such as C...
research
06/18/2019

Inferred successor maps for better transfer learning

Humans and animals show remarkable flexibility in adjusting their behavi...
research
05/20/2022

Learning Dense Reward with Temporal Variant Self-Supervision

Rewards play an essential role in reinforcement learning. In contrast to...
research
12/18/2017

'Indifference' methods for managing agent rewards

Indifference is a class of methods that are used to control a reward bas...
research
02/21/2021

Delayed Rewards Calibration via Reward Empirical Sufficiency

Appropriate credit assignment for delay rewards is a fundamental challen...
research
01/24/2019

Learning Independently-Obtainable Reward Functions

We present a novel method for learning a set of disentangled reward func...

Please sign up or login with your details

Forgot password? Click here to reset