Robust Reinforcement Learning using Offline Data

08/10/2022
by   Kishan Panaganti, et al.
0

The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the uncertainty in model parameters. Parameter uncertainty commonly occurs in many real-world RL applications due to simulator modeling errors, changes in the real-world system dynamics over time, and adversarial disturbances. Robust RL is typically formulated as a max-min problem, where the objective is to learn the policy that maximizes the value against the worst possible models that lie in an uncertainty set. In this work, we propose a robust RL algorithm called Robust Fitted Q-Iteration (RFQI), which uses only an offline dataset to learn the optimal robust policy. Robust RL with offline data is significantly more challenging than its non-robust counterpart because of the minimization over all models present in the robust Bellman operator. This poses challenges in offline data collection, optimization over the models, and unbiased estimation. In this work, we propose a systematic approach to overcome these challenges, resulting in our RFQI algorithm. We prove that RFQI learns a near-optimal robust policy under standard assumptions and demonstrate its superior performance on standard benchmark problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2021

Sample Complexity of Robust Reinforcement Learning with a Generative Model

The Robust Markov Decision Process (RMDP) framework focuses on designing...
research
07/19/2021

Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning

We study the problem of safe offline reinforcement learning (RL), the go...
research
08/11/2022

Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity

This paper concerns the central issues of model robustness and sample ef...
research
11/08/2022

ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data

We propose a new model-based offline RL framework, called Adversarial Mo...
research
09/04/2023

Marginalized Importance Sampling for Off-Environment Policy Evaluation

Reinforcement Learning (RL) methods are typically sample-inefficient, ma...
research
07/16/2022

BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion

We utilize an offline reinforcement learning (RL) model for sequential t...
research
02/23/2019

Distributionally Robust Reinforcement Learning

Generalization to unknown/uncertain environments of reinforcement learni...

Please sign up or login with your details

Forgot password? Click here to reset