Calculus on MDPs: Potential Shaping as a Gradient

08/20/2022
by   Erik Jenner, et al.
0

In reinforcement learning, different reward functions can be equivalent in terms of the optimal policies they induce. A particularly well-known and important example is potential shaping, a class of functions that can be added to any reward function without changing the optimal policy set under arbitrary transition dynamics. Potential shaping is conceptually similar to potentials, conservative vector fields and gauge transformations in math and physics, but this connection has not previously been formally explored. We develop a formalism for discrete calculus on graphs that abstract a Markov Decision Process, and show how potential shaping can be formally interpreted as a gradient within this framework. This allows us to strengthen results from Ng et al. (1999) describing conditions under which potential shaping is the only additive reward transformation to always preserve optimal policies. As an additional application of our formalism, we define a rule for picking a single unique reward function from each potential shaping equivalence class.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2022

Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning

While Reinforcement Learning (RL) aims to train an agent from a reward f...
research
02/14/2012

A Geometric Traversal Algorithm for Reward-Uncertain MDPs

Markov decision processes (MDPs) are widely used in modeling decision ma...
research
06/24/2020

Quantifying Differences in Reward Functions

For many tasks, the reward function is too complex to be specified proce...
research
08/28/2023

On Reward Structures of Markov Decision Processes

A Markov decision process can be parameterized by a transition kernel an...
research
04/16/2018

Distribution Estimation in Discounted MDPs via a Transformation

Although the general deterministic reward function in MDPs takes three a...
research
08/20/2021

Plug and Play, Model-Based Reinforcement Learning

Sample-efficient generalisation of reinforcement learning approaches hav...
research
09/10/2021

Potential-based Reward Shaping in Sokoban

Learning to solve sparse-reward reinforcement learning problems is diffi...

Please sign up or login with your details

Forgot password? Click here to reset