Policy Evaluation with Variance Related Risk Criteria in Markov Decision Processes

01/01/2013
by   Aviv Tamar, et al.
0

In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose both TD(0) and LSTD(lambda) variants with linear function approximation, prove their convergence, and demonstrate their utility in a 4-dimensional continuous state space problem.

READ FULL TEXT
research
06/27/2012

Policy Gradients with Variance Related Risk Criteria

Managing risk in dynamic decision problems is of cardinal importance in ...
research
07/09/2019

Variance-Based Risk Estimations in Markov Processes via Transformation with State Lumping

Variance plays a crucial role in risk-sensitive reinforcement learning, ...
research
04/29/2011

Mean-Variance Optimization in Markov Decision Processes

We consider finite horizon Markov decision processes under performance m...
research
05/14/2019

Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models

We introduce an off-policy evaluation procedure for highlighting episode...
research
04/22/2020

Per-Step Reward: A New Perspective for Risk-Averse Reinforcement Learning

We present a new per-step reward perspective for risk-averse control in ...
research
12/27/2017

On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning

We consider off-policy temporal-difference (TD) learning methods for pol...
research
09/09/2019

Policy Space Identification in Configurable Environments

We study the problem of identifying the policy space of a learning agent...

Please sign up or login with your details

Forgot password? Click here to reset