Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

07/03/2019
by   Falcon Z. Dai, et al.
0

We propose a new complexity measure for Markov decision processes (MDP), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. We further establish that shaping can reduce or increase MEHC by at most a factor of two in a large class of MDPs with finite MEHC and unsaturated optimal average rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2021

Markov Decision Processes with Long-Term Average Constraints

We consider the problem of constrained Markov Decision Process (CMDP) wh...
research
08/28/2023

On Reward Structures of Markov Decision Processes

A Markov decision process can be parameterized by a transition kernel an...
research
04/15/2021

Stochastic Processes with Expected Stopping Time

Markov chains are the de facto finite-state model for stochastic dynamic...
research
06/20/2019

Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process

We tackle the problem of acting in an unknown finite and discrete Markov...
research
06/20/2018

RUDDER: Return Decomposition for Delayed Rewards

We propose a novel reinforcement learning approach for finite Markov dec...
research
09/20/2022

Adaptive and Collaborative Bathymetric Channel-Finding Approach for Multiple Autonomous Marine Vehicle

This paper reports an investigation into the problem of rapid identifica...
research
05/04/2023

Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

We investigate an infinite-horizon average reward Markov Decision Proces...

Please sign up or login with your details

Forgot password? Click here to reset