Mean-Variance Optimization in Markov Decision Processes

04/29/2011

∙

We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopolynomial exact and approximation algorithms.

READ FULL TEXT

Mean-Variance Optimization in Markov Decision Processes

Sign in with Google

Consider DeepAI Pro