Avoiding Model Estimation in Robust Markov Decision Processes with a Generative Model

02/02/2023
by   Wenhao Yang, et al.
0

Robust Markov Decision Processes (MDPs) are getting more attention for learning a robust policy which is less sensitive to environment changes. There are an increasing number of works analyzing sample-efficiency of robust MDPs. However, most works study robust MDPs in a model-based regime, where the transition probability needs to be estimated and requires 𝒪(|𝒮|^2|𝒜|) storage in memory. A common way to solve robust MDPs is to formulate them as a distributionally robust optimization (DRO) problem. However, solving a DRO problem is non-trivial, so prior works typically assume a strong oracle to obtain the optimal solution of the DRO problem easily. To remove the need for an oracle, we first transform the original robust MDPs into an alternative form, as the alternative form allows us to use stochastic gradient methods to solve the robust MDPs. Moreover, we prove the alternative form still preserves the role of robustness. With this new formulation, we devise a sample-efficient algorithm to solve the robust MDPs in a model-free regime, from which we benefit lower memory space 𝒪(|𝒮||𝒜|) without using the oracle. Finally, we validate our theoretical findings via numerical experiments and show the efficiency to solve the alternative form of robust MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2023

An Efficient Solution to s-Rectangular Robust Markov Decision Processes

We present an efficient robust value iteration for -rectangular robust M...
research
01/31/2023

Policy Gradient for s-Rectangular Robust Markov Decision Processes

We present a novel robust policy gradient method (RPG) for s-rectangular...
research
03/12/2023

Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization

Robust Markov decision processes (MDPs) aim to handle changing or partia...
research
05/28/2022

Efficient Policy Iteration for Robust Markov Decision Processes via Regularization

Robust Markov decision processes (MDPs) provide a general framework to m...
research
05/17/2023

Model-Free Robust Average-Reward Reinforcement Learning

Robust Markov decision processes (MDPs) address the challenge of model u...
research
12/12/2018

Transition Tensor Markov Decision Processes: Analyzing Shot Policies in Professional Basketball

In this paper we model basketball plays as episodes from team-specific n...
research
01/30/2022

The Geometry of Robust Value Functions

The space of value functions is a fundamental concept in reinforcement l...

Please sign up or login with your details

Forgot password? Click here to reset