Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

05/30/2023
by   Mengmeng Li, et al.
0

We propose a policy gradient algorithm for robust infinite-horizon Markov Decision Processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that display statistical optimality properties and make optimal use of limited data often fail to be rectangular. Unfortunately, the corresponding robust MDPs cannot be solved with dynamic programming techniques and are in fact provably intractable. This prompts us to develop a projected Langevin dynamics algorithm tailored to the robust policy evaluation problem, which offers global optimality guarantees. We also propose a deterministic policy gradient method that solves the robust policy evaluation problem approximately, and we prove that the approximation error scales with a new measure of non-rectangularity of the uncertainty set. Numerical experiments showcase that our projected Langevin dynamics algorithm can escape local optima, while algorithms tailored to rectangular uncertainty fail to do so.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2022

On the Convergence of Policy Gradient in Robust MDPs

Robust Markov decision processes (RMDPs) are promising models that provi...
research
01/31/2023

Policy Gradient for s-Rectangular Robust Markov Decision Processes

We present a novel robust policy gradient method (RPG) for s-rectangular...
research
06/26/2013

Scaling Up Robust MDPs by Reinforcement Learning

We consider large-scale Markov decision processes (MDPs) with parameter ...
research
06/13/2022

Markov Decision Processes under Model Uncertainty

We introduce a general framework for Markov decision problems under mode...
research
05/11/2020

Scalable First-Order Methods for Robust MDPs

Markov Decision Processes (MDP) are a widely used model for dynamic deci...
research
02/07/2020

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts

Informed and robust decision making in the face of uncertainty is critic...
research
06/06/2019

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

We propose a new objective function for finite-horizon episodic Markov d...

Please sign up or login with your details

Forgot password? Click here to reset