Mean-Variance Optimization in Markov Decision Processes

04/29/2011
by   Shie Mannor, et al.
0

We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopolynomial exact and approximation algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/15/2022

A unified algorithm framework for mean-variance optimization in discounted Markov decision processes

This paper studies the risk-averse mean-variance optimization in infinit...
research
01/01/2013

Policy Evaluation with Variance Related Risk Criteria in Markov Decision Processes

In this paper we extend temporal difference policy evaluation algorithms...
research
05/27/2022

Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space

Reward optimization in fully observable Markov decision processes is equ...
research
02/27/2020

Learning in Markov Decision Processes under Constraints

We consider reinforcement learning (RL) in Markov Decision Processes (MD...
research
08/09/2020

Risk-Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance

This paper investigates the optimization problem of an infinite stage di...
research
04/20/2015

Optimal Nudging: Solving Average-Reward Semi-Markov Decision Processes as a Minimal Sequence of Cumulative Tasks

This paper describes a novel method to solve average-reward semi-Markov ...
research
02/08/2015

Contextual Markov Decision Processes

We consider a planning problem where the dynamics and rewards of the env...

Please sign up or login with your details

Forgot password? Click here to reset