Towards Minimax Optimality of Model-based Robust Reinforcement Learning

by   Pierre Clavier, et al.

We study the sample complexity of obtaining an ϵ-optimal policy in Robust discounted Markov Decision Processes (RMDPs), given only access to a generative model of the nominal kernel. This problem is widely studied in the non-robust case, and it is known that any planning approach applied to an empirical MDP estimated with 𝒪̃(H^3 | S || A |/ϵ^2) samples provides an ϵ-optimal policy, which is minimax optimal. Results in the robust case are much more scarce. For sa- (resp s-)rectangular uncertainty sets, the best known sample complexity is 𝒪̃(H^4 | S |^2| A |/ϵ^2) (resp. 𝒪̃(H^4 | S |^2| A |^2/ϵ^2)), for specific algorithms and when the uncertainty set is based on the total variation (TV), the KL or the Chi-square divergences. In this paper, we consider uncertainty sets defined with an L_p-ball (recovering the TV case), and study the sample complexity of any planning algorithm (with high accuracy guarantee on the solution) applied to an empirical RMDP estimated using the generative model. In the general case, we prove a sample complexity of 𝒪̃(H^4 | S || A |/ϵ^2) for both the sa- and s-rectangular cases (improvements of | S | and | S || A | respectively). When the size of the uncertainty is small enough, we improve the sample complexity to 𝒪̃(H^3 | S || A |/ϵ^2), recovering the lower-bound for the non-robust case for the first time and a robust lower-bound when the size of the uncertainty is small enough.


page 1

page 2

page 3

page 4


The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

This paper investigates model robustness in reinforcement learning (RL) ...

Non-asymptotic Performances of Robust Markov Decision Processes

In this paper, we study the non-asymptotic performance of optimal policy...

Replicability in Reinforcement Learning

We initiate the mathematical study of replicability as an algorithmic pr...

Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning

We consider the problem of learning a control policy that is robust agai...

Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP

This work considers the sample complexity of obtaining an ε-optimal poli...

Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

It is believed that a model-based approach for reinforcement learning (R...

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

We investigate the sample efficiency of reinforcement learning in a γ-di...

Please sign up or login with your details

Forgot password? Click here to reset