Effect of Reward Function Choices in MDPs with Value-at-Risk

12/07/2016
by   Shuai Ma, et al.
0

This paper studies Value-at-Risk (VaR) problems in short- and long-horizon Markov decision processes (MDPs) with finite state space and two different reward functions. Firstly we examine the effects of two reward functions under two criteria in a short-horizon MDP. We show that under the VaR criterion, when the original reward function is on both current and next states, the reward simplification will change the VaR. Secondly, for long-horizon MDPs, we estimate the Pareto front of the total reward distribution set with the aid of spectral theory and the central limit theorem. Since the estimation is for a Markov process with the simplified reward function only, we present a transformation algorithm for the Markov process with the original reward function, in order to estimate the Pareto front with an intact total reward distribution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2018

Distribution Estimation in Discounted MDPs via a Transformation

Although the general deterministic reward function in MDPs takes three a...
research
01/15/2022

A unified algorithm framework for mean-variance optimization in discounted Markov decision processes

This paper studies the risk-averse mean-variance optimization in infinit...
research
06/25/2017

Specifying Non-Markovian Rewards in MDPs Using LDL on Finite Traces (Preliminary Version)

In Markov Decision Processes (MDPs), the reward obtained in a state depe...
research
02/09/2020

Maximizing the Total Reward via Reward Tweaking

In reinforcement learning, the discount factor γ controls the agent's ef...
research
05/08/2018

Fast Online Exact Solutions for Deterministic MDPs with Sparse Rewards

Markov Decision Processes (MDPs) are a mathematical framework for modeli...
research
05/09/2012

Regret-based Reward Elicitation for Markov Decision Processes

The specification of aMarkov decision process (MDP) can be difficult. Re...
research
09/15/2019

Exploiting Fast Decaying and Locality in Multi-Agent MDP with Tree Dependence Structure

This paper considers a multi-agent Markov Decision Process (MDP), where ...

Please sign up or login with your details

Forgot password? Click here to reset