Regularized Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity

06/20/2023
by   Runyu Zhang, et al.
0

This paper focuses on reinforcement learning for the regularized robust Markov decision process (MDP) problem, an extension of the robust MDP framework. We first introduce the risk-sensitive MDP and establish the equivalence between risk-sensitive MDP and regularized robust MDP. This equivalence offers an alternative perspective for addressing the regularized RMDP and enables the design of efficient learning algorithms. Given this equivalence, we further derive the policy gradient theorem for the regularized robust MDP problem and prove the global convergence of the exact policy gradient method under the tabular setting with direct parameterization. We also propose a sample-based offline learning algorithm, namely the robust fitted-Z iteration (RFZI), for a specific regularized robust MDP problem with a KL-divergence regularization term and analyze the sample complexity of the algorithm. Our results are also supported by numerical simulations.

READ FULL TEXT
research
07/09/2021

Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis

We propose policy-gradient algorithms for solving the problem of control...
research
06/10/2022

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

We study policy optimization for Markov decision processes (MDPs) with m...
research
09/21/2022

First-order Policy Optimization for Robust Markov Decision Process

We consider the problem of solving robust Markov decision process (MDP),...
research
07/10/2018

Generalized deterministic policy gradient algorithms

We study a setting of reinforcement learning, where the state transition...
research
06/06/2015

Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach

In this paper we address the problem of decision making within a Markov ...
research
05/27/2021

Exploitation vs Caution: Risk-sensitive Policies for Offline Learning

Offline model learning for planning is a branch of machine learning that...
research
02/08/2023

Modified Policy Iteration for Exponential Cost Risk Sensitive MDPs

Modified policy iteration (MPI) also known as optimistic policy iteratio...

Please sign up or login with your details

Forgot password? Click here to reset