Specifying Behavior Preference with Tiered Reward Functions

12/07/2022
by   Zhiyuan Zhou, et al.
0

Reinforcement-learning agents seek to maximize a reward signal through environmental interactions. As humans, our contribution to the learning process is through designing the reward function. Like programmers, we have a behavior in mind and have to translate it into a formal specification, namely rewards. In this work, we consider the reward-design problem in tasks formulated as reaching desirable states and avoiding undesirable states. To start, we propose a strict partial ordering of the policy space. We prefer policies that reach the good states faster and with higher probability while avoiding the bad states longer. Next, we propose an environment-independent tiered reward structure and show it is guaranteed to induce policies that are Pareto-optimal according to our preference relation. Finally, we empirically evaluate tiered reward functions on several environments and show they induce desired behavior and lead to fast learning.

READ FULL TEXT
research
05/30/2022

Designing Rewards for Fast Learning

To convey desired behavior to a Reinforcement Learning (RL) agent, a des...
research
05/31/2023

ROSARL: Reward-Only Safe Reinforcement Learning

An important problem in reinforcement learning is designing agents that ...
research
01/29/2018

Learning the Reward Function for a Misspecified Model

In model-based reinforcement learning it is typical to treat the problem...
research
05/28/2018

Reward Constrained Policy Optimization

Teaching agents to perform tasks using Reinforcement Learning is no easy...
research
12/24/2020

Mesh Based Analysis of Low Fractal Dimension ReinforcementLearning Policies

In previous work, using a process we call meshing, the reachable state s...
research
09/28/2021

A First-Occupancy Representation for Reinforcement Learning

Both animals and artificial agents benefit from state representations th...
research
05/27/2011

Learning to Order Things

There are many applications in which it is desirable to order rather tha...

Please sign up or login with your details

Forgot password? Click here to reset