Algorithms for CVaR Optimization in MDPs

06/12/2014
by   Yinlam Chow, et al.
0

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2014

Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

In many sequential decision-making problems we may want to manage risk b...
research
12/05/2015

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

In many sequential decision-making problems one is interested in minimiz...
research
06/30/2023

Risk-sensitive Actor-free Policy via Convex Optimization

Traditional reinforcement learning methods optimize agents without consi...
research
04/15/2014

Optimizing the CVaR via Sampling

Conditional Value at Risk (CVaR) is a prominent risk measure that is bei...
research
06/27/2012

Policy Gradients with Variance Related Risk Criteria

Managing risk in dynamic decision problems is of cardinal importance in ...
research
04/09/2018

Policy Gradient With Value Function Approximation For Collective Multiagent Planning

Decentralized (PO)MDPs provide an expressive framework for sequential de...
research
05/12/2014

Policy Gradients for CVaR-Constrained MDPs

We study a risk-constrained version of the stochastic shortest path (SSP...

Please sign up or login with your details

Forgot password? Click here to reset