Non-asymptotic Performances of Robust Markov Decision Processes

05/09/2021
by   Wenhao Yang, et al.
4

In this paper, we study the non-asymptotic performance of optimal policy on robust value function with true transition dynamics. The optimal robust policy is solved from a generative model or offline dataset without access to true transition dynamics. In particular, we consider three different uncertainty sets including the L_1, χ^2 and KL balls in both (s,a)-rectangular and s-rectangular assumptions. Our results show that when we assume (s,a)-rectangular on uncertainty sets, the sample complexity is about O(|𝒮|^2|𝒜|/ε^2ρ^2(1-γ)^4) in the generative model setting and O(|𝒮|/ν_minε^2ρ^2(1-γ)^4) in the offline dataset setting. While prior works on non-asymptotic performances are restricted with the KL ball and (s,a)-rectangular assumption, we also extend our results to a more general s-rectangular assumption, which leads to a larger sample complexity than the (s,a)-rectangular assumption.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset