Robust temporal difference learning for critical domains

01/23/2019
by   Richard Klima, et al.
20

We present a new Q-function operator for temporal difference (TD) learning methods that explicitly encodes robustness against significant rare events (SRE) in critical domains. The operator, which we call the κ-operator, allows to learn a safe policy in a model-based fashion without actually observing the SRE. We introduce single- and multi-agent robust TD methods using the operator κ. We prove convergence of the operator to the optimal safe Q-function with respect to the model using the theory of Generalized Markov Decision Processes. In addition we prove convergence to the optimal Q-function of the original MDP given that the probability of SREs vanishes. Empirical evaluations demonstrate the superior performance of κ-based TD methods both in the early learning phase as well as in the final converged stage. In addition we show robustness of the proposed method to small model errors, as well as its applicability in a multi-agent context.

READ FULL TEXT

page 12

page 13

research
01/15/2020

Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping

We present a new model-based reinforcement learning algorithm, Cooperati...
research
07/31/2023

Distributed Dynamic Programming forNetworked Multi-Agent Markov Decision Processes

The main goal of this paper is to investigate distributed dynamic progra...
research
06/18/2020

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

In this paper we propose novel distributed gradient-based temporal diffe...
research
02/27/2019

Learning Factored Markov Decision Processes with Unawareness

Methods for learning and planning in sequential decision problems often ...
research
12/16/2022

Towards Causal Temporal Reasoning for Markov Decision Processes

We introduce a new probabilistic temporal logic for the verification of ...
research
05/20/2019

A Bayesian Approach to Robust Reinforcement Learning

Robust Markov Decision Processes (RMDPs) intend to ensure robustness wit...
research
12/04/2019

A Variational Perturbative Approach to Planning in Graph-based Markov Decision Processes

Coordinating multiple interacting agents to achieve a common goal is a d...

Please sign up or login with your details

Forgot password? Click here to reset