Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty

06/23/2020
by   Nelson Vadori, et al.
0

We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually more meaningful in some cases. To this end, we present a new decomposition of the randomness contained in the cumulative reward based on the Doob decomposition of a stochastic process, and introduce a new conceptual tool - the chaotic variation - which can rigorously be interpreted as the risk measure of the martingale component associated to the cumulative reward process. We innovate on the reinforcement learning side by incorporating this new risk-sensitive approach into model-free algorithms, both policy gradient and value function based, and illustrate its relevance on grid world and portfolio optimization problems.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/22/2022

Approximate gradient ascent methods for distortion risk measures

We propose approximate gradient ascent algorithms for risk-sensitive rei...
11/04/2021

Model-Free Risk-Sensitive Reinforcement Learning

We extend temporal-difference (TD) learning in order to obtain risk-sens...
06/22/2020

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

We study risk-sensitive reinforcement learning in episodic Markov decisi...
07/09/2019

A Scheme for Dynamic Risk-Sensitive Sequential Decision Making

We present a scheme for sequential decision making with a risk-sensitive...
02/27/2020

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

We study the estimation of risk-sensitive policies in reinforcement lear...
10/22/2019

Teach Biped Robots to Walk via Gait Principles and Reinforcement Learning with Adversarial Critics

Controlling a biped robot to walk stably is a challenging task consideri...
04/20/2015

Optimal Nudging: Solving Average-Reward Semi-Markov Decision Processes as a Minimal Sequence of Cumulative Tasks

This paper describes a novel method to solve average-reward semi-Markov ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.