First-order Policy Optimization for Robust Markov Decision Process

09/21/2022
by   Yan Li, et al.
0

We consider the problem of solving robust Markov decision process (MDP), which involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels. The goal of planning is to find a robust policy that optimizes the worst-case values against the transition uncertainties, and thus encompasses the standard MDP planning as a special case. For (𝐬,𝐚)-rectangular uncertainty sets, we develop a policy-based first-order method, namely the robust policy mirror descent (RPMD), and establish an 𝒪(log(1/ϵ)) and 𝒪(1/ϵ) iteration complexity for finding an ϵ-optimal policy, with two increasing-stepsize schemes. The prior convergence of RPMD is applicable to any Bregman divergence, provided the policy space has bounded radius measured by the divergence when centering at the initial policy. Moreover, when the Bregman divergence corresponds to the squared euclidean distance, we establish an 𝒪(max{1/ϵ, 1/(ηϵ^2)}) complexity of RPMD with any constant stepsize η. For a general class of Bregman divergences, a similar complexity is also established for RPMD with constant stepsizes, provided the uncertainty set satisfies the relative strong convexity. We further develop a stochastic variant, named SRPMD, when the first-order information is only available through online interactions with the nominal environment. For general Bregman divergences, we establish an 𝒪(1/ϵ^2) and 𝒪(1/ϵ^3) sample complexity with two increasing-stepsize schemes. For the euclidean Bregman divergence, we establish an 𝒪(1/ϵ^3) sample complexity with constant stepsizes. To the best of our knowledge, all the aforementioned results appear to be new for policy-based first-order methods applied to the robust MDP problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2023

Regularized Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity

This paper focuses on reinforcement learning for the regularized robust ...
research
07/29/2023

First-order Policy Optimization for Robust Policy Evaluation

We adopt a policy optimization viewpoint towards policy evaluation for r...
research
12/02/2021

Sample Complexity of Robust Reinforcement Learning with a Generative Model

The Robust Markov Decision Process (RMDP) framework focuses on designing...
research
12/08/2016

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

We study the online estimation of the optimal policy of a Markov decisio...
research
05/26/2023

The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

This paper investigates model robustness in reinforcement learning (RL) ...
research
03/08/2023

Policy Mirror Descent Inherently Explores Action Space

Designing computationally efficient exploration strategies for on-policy...
research
02/13/2021

Online Apprenticeship Learning

In Apprenticeship Learning (AL), we are given a Markov Decision Process ...

Please sign up or login with your details

Forgot password? Click here to reset