Modified Policy Iteration for Exponential Cost Risk Sensitive MDPs

02/08/2023
by   Yashaswini Murthy, et al.
0

Modified policy iteration (MPI) also known as optimistic policy iteration is at the core of many reinforcement learning algorithms. It works by combining elements of policy iteration and value iteration. The convergence of MPI has been well studied in the case of discounted and average-cost MDPs. In this work, we consider the exponential cost risk-sensitive MDP formulation, which is known to provide some robustness to model parameters. Although policy iteration and value iteration have been well studied in the context of risk sensitive MDPs, modified policy iteration is relatively unexplored. We provide the first proof that MPI also converges for the risk-sensitive problem in the case of finite state and action spaces. Since the exponential cost formulation deals with the multiplicative Bellman equation, our main contribution is a convergence proof which is quite different than existing results for discounted and risk-neutral average-cost problems. The proof of approximate modified policy iteration for risk sensitive MDPs is also provided in the appendix.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2012

Approximate Modified Policy Iteration

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm ...
research
11/28/2019

Analysis of Lower Bounds for Simple Policy Iteration

Policy iteration is a family of algorithms that are used to find an opti...
research
06/06/2015

Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach

In this paper we address the problem of decision making within a Markov ...
research
06/20/2023

Regularized Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity

This paper focuses on reinforcement learning for the regularized robust ...
research
03/17/2023

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

Many model-based reinforcement learning (RL) algorithms can be viewed as...
research
09/20/2019

On the Convergence of Approximate and Regularized Policy Iteration Schemes

Algorithms based on the entropy regularized framework, such as Soft Q-le...
research
02/27/2020

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

We study the estimation of risk-sensitive policies in reinforcement lear...

Please sign up or login with your details

Forgot password? Click here to reset