Sample Complexity of Robust Reinforcement Learning with a Generative Model

12/02/2021
by   Kishan Panaganti, et al.
0

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an ϵ-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies. Finally, we demonstrate the performance of our algorithm on two benchmark problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2023

Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning

We consider the problem of learning a control policy that is robust agai...
research
08/10/2022

Robust Reinforcement Learning using Offline Data

The goal of robust reinforcement learning (RL) is to learn a policy that...
research
09/05/2023

Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

Three major challenges in reinforcement learning are the complex dynamic...
research
05/26/2023

The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

This paper investigates model robustness in reinforcement learning (RL) ...
research
05/09/2021

Non-asymptotic Performances of Robust Markov Decision Processes

In this paper, we study the non-asymptotic performance of optimal policy...
research
09/21/2022

First-order Policy Optimization for Robust Markov Decision Process

We consider the problem of solving robust Markov decision process (MDP),...
research
05/30/2019

Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm

We present a novel algorithm to estimate the barycenter of arbitrary pro...

Please sign up or login with your details

Forgot password? Click here to reset