DeepAI AI Chat
Log In Sign Up

Towards Minimax Optimality of Model-based Robust Reinforcement Learning

by   Pierre Clavier, et al.

We study the sample complexity of obtaining an ϵ-optimal policy in Robust discounted Markov Decision Processes (RMDPs), given only access to a generative model of the nominal kernel. This problem is widely studied in the non-robust case, and it is known that any planning approach applied to an empirical MDP estimated with 𝒪̃(H^3 | S || A |/ϵ^2) samples provides an ϵ-optimal policy, which is minimax optimal. Results in the robust case are much more scarce. For sa- (resp s-)rectangular uncertainty sets, the best known sample complexity is 𝒪̃(H^4 | S |^2| A |/ϵ^2) (resp. 𝒪̃(H^4 | S |^2| A |^2/ϵ^2)), for specific algorithms and when the uncertainty set is based on the total variation (TV), the KL or the Chi-square divergences. In this paper, we consider uncertainty sets defined with an L_p-ball (recovering the TV case), and study the sample complexity of any planning algorithm (with high accuracy guarantee on the solution) applied to an empirical RMDP estimated using the generative model. In the general case, we prove a sample complexity of 𝒪̃(H^4 | S || A |/ϵ^2) for both the sa- and s-rectangular cases (improvements of | S | and | S || A | respectively). When the size of the uncertainty is small enough, we improve the sample complexity to 𝒪̃(H^3 | S || A |/ϵ^2), recovering the lower-bound for the non-robust case for the first time and a robust lower-bound when the size of the uncertainty is small enough.


page 1

page 2

page 3

page 4


The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

This paper investigates model robustness in reinforcement learning (RL) ...

Non-asymptotic Performances of Robust Markov Decision Processes

In this paper, we study the non-asymptotic performance of optimal policy...

On the Optimality of Sparse Model-Based Planning for Markov Decision Processes

This work considers the sample complexity of obtaining an ϵ-optimal poli...

Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning

We consider the problem of learning a control policy that is robust agai...

A Bayesian Approach to Robust Reinforcement Learning

Robust Markov Decision Processes (RMDPs) intend to ensure robustness wit...

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

We investigate the sample efficiency of reinforcement learning in a γ-di...

Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

It has been a trend in the Reinforcement Learning literature to derive s...