The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

05/26/2023
by   Laixi Shi, et al.
5

This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice. We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimizes the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP. Despite recent efforts, the sample complexity of RMDPs remained mostly unsettled regardless of the uncertainty set in use. It was unclear if distributional robustness bears any statistical consequences when benchmarked against standard RL. Assuming access to a generative model that draws samples based on the nominal MDP, we characterize the sample complexity of RMDPs when the uncertainty set is specified via either the total variation (TV) distance or χ^2 divergence. The algorithm studied here is a model-based method called distributionally robust value iteration, which is shown to be near-optimal for the full range of uncertainty levels. Somewhat surprisingly, our results uncover that RMDPs are not necessarily easier or harder to learn than standard MDPs. The statistical consequence incurred by the robustness requirement depends heavily on the size and shape of the uncertainty set: in the case w.r.t. the TV distance, the minimax sample complexity of RMDPs is always smaller than that of standard MDPs; in the case w.r.t. the χ^2 divergence, the sample complexity of RMDPs can often far exceed the standard MDP counterpart.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2023

Towards Minimax Optimality of Model-based Robust Reinforcement Learning

We study the sample complexity of obtaining an ϵ-optimal policy in Robus...
research
12/02/2021

Sample Complexity of Robust Reinforcement Learning with a Generative Model

The Robust Markov Decision Process (RMDP) framework focuses on designing...
research
03/05/2023

Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning

We consider the problem of learning a control policy that is robust agai...
research
09/05/2023

Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

Three major challenges in reinforcement learning are the complex dynamic...
research
09/14/2022

Robust Constrained Reinforcement Learning

Constrained reinforcement learning is to maximize the expected reward su...
research
09/21/2022

First-order Policy Optimization for Robust Markov Decision Process

We consider the problem of solving robust Markov decision process (MDP),...
research
05/28/2023

Sample Complexity of Variance-reduced Distributionally Robust Q-learning

Dynamic decision making under distributional shifts is of fundamental in...

Please sign up or login with your details

Forgot password? Click here to reset