Robust Average-Reward Markov Decision Processes

01/02/2023
by   Yue Wang, et al.
0

In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy that optimizes the worst-case average reward over an uncertainty set. We first take an approach that approximates average-reward MDPs using discounted MDPs. We prove that the robust discounted value function converges to the robust average-reward as the discount factor γ goes to 1, and moreover, when γ is large, any optimal policy of the robust discounted MDP is also an optimal policy of the robust average-reward. We further design a robust dynamic programming approach, and theoretically characterize its convergence to the optimum. Then, we investigate robust average-reward MDPs directly without using discounted MDPs as an intermediate step. We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2023

Model-Free Robust Average-Reward Reinforcement Learning

Robust Markov decision processes (MDPs) address the challenge of model u...
research
01/03/2023

Risk-Averse MDPs under Reward Ambiguity

We propose a distributionally robust return-risk model for Markov decisi...
research
02/27/2021

Parallel Stochastic Mirror Descent for MDPs

We consider the problem of learning the optimal policy for infinite-hori...
research
05/31/2022

Robust Anytime Learning of Markov Decision Processes

Markov decision processes (MDPs) are formal models commonly used in sequ...
research
06/26/2013

Scaling Up Robust MDPs by Reinforcement Learning

We consider large-scale Markov decision processes (MDPs) with parameter ...
research
09/03/2023

Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

In robust Markov decision processes (RMDPs), it is assumed that the rewa...
research
01/30/2022

The Geometry of Robust Value Functions

The space of value functions is a fundamental concept in reinforcement l...

Please sign up or login with your details

Forgot password? Click here to reset