Model-Free Robust Average-Reward Reinforcement Learning

05/17/2023
by   Yue Wang, et al.
0

Robust Markov decision processes (MDPs) address the challenge of model uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on the robust average-reward MDPs under the model-free setting. We first theoretically characterize the structure of solutions to the robust average-reward Bellman equation, which is essential for our later convergence analysis. We then design two model-free algorithms, robust relative value iteration (RVI) TD and robust RVI Q-learning, and theoretically prove their convergence to the optimal solution. We provide several widely used uncertainty sets as examples, including those defined by the contamination model, total variation, Chi-squared divergence, Kullback-Leibler (KL) divergence and Wasserstein distance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2023

Robust Average-Reward Markov Decision Processes

In robust Markov decision processes (MDPs), the uncertainty in the trans...
research
09/14/2022

Robust Constrained Reinforcement Learning

Constrained reinforcement learning is to maximize the expected reward su...
research
07/15/2023

Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation

Robustness has been extensively studied in reinforcement learning (RL) t...
research
06/08/2020

A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret

Recently, model-free reinforcement learning has attracted research atten...
research
10/18/2020

Average-reward model-free reinforcement learning: a systematic review and literature mapping

Model-free reinforcement learning (RL) has been an active area of resear...
research
02/02/2023

Avoiding Model Estimation in Robust Markov Decision Processes with a Generative Model

Robust Markov Decision Processes (MDPs) are getting more attention for l...
research
05/11/2020

Scalable First-Order Methods for Robust MDPs

Markov Decision Processes (MDP) are a widely used model for dynamic deci...

Please sign up or login with your details

Forgot password? Click here to reset