Non-asymptotic Performances of Robust Markov Decision Processes
In this paper, we study the non-asymptotic performance of optimal policy on robust value function with true transition dynamics. The optimal robust policy is solved from a generative model or offline dataset without access to true transition dynamics. In particular, we consider three different uncertainty sets including the L_1, Ļ^2 and KL balls in both (s,a)-rectangular and s-rectangular assumptions. Our results show that when we assume (s,a)-rectangular on uncertainty sets, the sample complexity is about O(|š®|^2|š|/Īµ^2Ļ^2(1-Ī³)^4) in the generative model setting and O(|š®|/Ī½_minĪµ^2Ļ^2(1-Ī³)^4) in the offline dataset setting. While prior works on non-asymptotic performances are restricted with the KL ball and (s,a)-rectangular assumption, we also extend our results to a more general s-rectangular assumption, which leads to a larger sample complexity than the (s,a)-rectangular assumption.
READ FULL TEXT