Risk-Averse Learning by Temporal Difference Methods

03/02/2020
by   Umit Kose, et al.
0

We consider reinforcement learning with performance evaluated by a dynamic risk measure. We construct a projected risk-averse dynamic programming equation and study its properties. Then we propose risk-averse counterparts of the methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex transportation problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2022

Contractivity of Bellman Operator in Risk Averse Dynamic Programming with Infinite Horizon

The paper deals with a risk averse dynamic programming problem with infi...
research
06/14/2019

Epistemic Risk-Sensitive Reinforcement Learning

We develop a framework for interacting with uncertain environments in re...
research
09/07/2015

Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

In this paper, we consider a finite-horizon Markov decision process (MDP...
research
04/24/2023

On Dynamic Program Decompositions of Static Risk Measures

Optimizing static risk-averse objectives in Markov decision processes is...
research
05/11/2017

A First Empirical Study of Emphatic Temporal Difference Learning

In this paper we present the first empirical study of the emphatic tempo...
research
01/11/2023

An Analysis of Quantile Temporal-Difference Learning

We analyse quantile temporal-difference learning (QTD), a distributional...
research
08/14/2018

A note on representation of BSDE-based dynamic risk measures and dynamic capital allocations

In this paper, we provide a representation theorem for dynamic capital a...

Please sign up or login with your details

Forgot password? Click here to reset