Variance-Based Risk Estimations in Markov Processes via Transformation with State Lumping

07/09/2019
by   Shuai Ma, et al.
1

Variance plays a crucial role in risk-sensitive reinforcement learning, and most risk measures can be analyzed via variance. In this paper, we consider two law-invariant risks as examples: mean-variance risk and exponential utility risk. With the aid of the state-augmentation transformation (SAT), we show that, the two risks can be estimated in Markov decision processes (MDPs) with a stochastic transition-based reward and a randomized policy. To relieve the enlarged state space, a novel definition of isotopic states is proposed for state lumping, considering the special structure of the transformed transition probability. In the numerical experiment, we illustrate state lumping in the SAT, errors from a naive reward simplification, and the validity of the SAT for the two risk estimations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/01/2013

Policy Evaluation with Variance Related Risk Criteria in Markov Decision Processes

In this paper we extend temporal difference policy evaluation algorithms...
research
07/09/2019

A Scheme for Dynamic Risk-Sensitive Sequential Decision Making

We present a scheme for sequential decision making with a risk-sensitive...
research
01/15/2022

A unified algorithm framework for mean-variance optimization in discounted Markov decision processes

This paper studies the risk-averse mean-variance optimization in infinit...
research
03/28/2022

Risk regularization through bidirectional dispersion

Many alternative notions of "risk" (e.g., CVaR, entropic risk, DRO risk)...
research
01/14/2023

Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk Measures

Traditional reinforcement learning (RL) aims to maximize the expected to...
research
08/09/2020

Risk-Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance

This paper investigates the optimization problem of an infinite stage di...
research
06/27/2019

Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes

The honeynet is a promising active cyber defense mechanism. It reveals t...

Please sign up or login with your details

Forgot password? Click here to reset