Online Robust Reinforcement Learning with Model Uncertainty

09/29/2021
by   Yue Wang, et al.
0

Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust RL, where the uncertainty set is defined to be centering at a misspecified MDP that generates a single sample trajectory sequentially and is assumed to be unknown. We develop a sample-based approach to estimate the unknown uncertainty set and design a robust Q-learning algorithm (tabular case) and robust TDC algorithm (function approximation setting), which can be implemented in an online and incremental fashion. For the robust Q-learning algorithm, we prove that it converges to the optimal robust Q function, and for the robust TDC algorithm, we prove that it converges asymptotically to some stationary points. Unlike the results in [Roy et al., 2017], our algorithms do not need any additional conditions on the discount factor to guarantee the convergence. We further characterize the finite-time error bounds of the two algorithms and show that both the robust Q-learning and robust TDC algorithms converge as fast as their vanilla counterparts(within a constant factor). Our numerical experiments further demonstrate the robustness of our algorithms. Our approach can be readily extended to robustify many other algorithms, e.g., TD, SARSA, and other GTD algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2023

On Practical Robust Reinforcement Learning: Practical Uncertainty Set and Double-Agent Algorithm

We study a robust reinforcement learning (RL) with model uncertainty. Gi...
research
08/05/2021

An Elementary Proof that Q-learning Converges Almost Surely

Watkins' and Dayan's Q-learning is a model-free reinforcement learning a...
research
09/14/2022

Robust Constrained Reinforcement Learning

Constrained reinforcement learning is to maximize the expected reward su...
research
11/10/2022

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Model-free algorithms for reinforcement learning typically require a con...
research
02/20/2020

Adaptive Temporal Difference Learning with Linear Function Approximation

This paper revisits the celebrated temporal difference (TD) learning alg...
research
05/22/2023

Distributionally Robust Optimization Efficiently Solves Offline Reinforcement Learning

Offline reinforcement learning aims to find the optimal policy from a pr...
research
07/15/2023

Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation

Robustness has been extensively studied in reinforcement learning (RL) t...

Please sign up or login with your details

Forgot password? Click here to reset