Reward Advancement: Transforming Policy under Maximum Causal Entropy Principle

07/11/2019
by   Guojun Wu, et al.
0

Many real-world human behaviors can be characterized as a sequential decision making processes, such as urban travelers choices of transport modes and routes (Wu et al. 2017). Differing from choices controlled by machines, which in general follows perfect rationality to adopt the policy with the highest reward, studies have revealed that human agents make sub-optimal decisions under bounded rationality (Tao, Rohde, and Corcoran 2014). Such behaviors can be modeled using maximum causal entropy (MCE) principle (Ziebart 2010). In this paper, we define and investigate a general reward trans-formation problem (namely, reward advancement): Recovering the range of additional reward functions that transform the agent's policy from original policy to a predefined target policy under MCE principle. We show that given an MDP and a target policy, there are infinite many additional reward functions that can achieve the desired policy transformation. Moreover, we propose an algorithm to further extract the additional rewards with minimum "cost" to implement the policy transformation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2018

On Learning Intrinsic Rewards for Policy Gradient Methods

In many sequential decision making tasks, it is challenging to design re...
research
01/28/2022

Do You Need the Entropy Reward (in Practice)?

Maximum entropy (MaxEnt) RL maximizes a combination of the original task...
research
03/17/2019

Modeling and Optimization of Human-machine Interaction Processes via the Maximum Entropy Principle

We propose a data-driven framework to enable the modeling and optimizati...
research
05/29/2023

Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism

In this paper, we study offline Reinforcement Learning with Human Feedba...
research
01/25/2022

Dynamics-Aware Comparison of Learned Reward Functions

The ability to learn reward functions plays an important role in enablin...
research
05/04/2023

Maximum Causal Entropy Inverse Constrained Reinforcement Learning

When deploying artificial agents in real-world environments where they i...
research
12/25/2022

Linear Combinatorial Semi-Bandit with Causally Related Rewards

In a sequential decision-making problem, having a structural dependency ...

Please sign up or login with your details

Forgot password? Click here to reset