Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning

04/10/2017
by   El Mahdi El Mhamdi, et al.
0

In reinforcement learning, agents learn by performing actions and observing their outcomes. Sometimes, it is desirable for a human operator to interrupt an agent in order to prevent dangerous situations from happening. Yet, as part of their learning process, agents may link these interruptions, that impact their reward, to specific states and deliberately avoid them. The situation is particularly challenging in a multi-agent context because agents might not only learn from their own past interruptions, but also from those of other agents. Orseau and Armstrong defined safe interruptibility for one learner, but their work does not naturally extend to multi-agent systems. This paper introduces dynamic safe interruptibility, an alternative definition more suited to decentralized learning problems, and studies this notion in two learning frameworks: joint action learners and independent learners. We give realistic sufficient conditions on the learning algorithm to enable dynamic safe interruptibility in the case of joint action learners, yet show that these conditions are not sufficient for independent learners. We show however that if agents can detect interruptions, it is possible to prune the observations to ensure dynamic safe interruptibility even for independent learners.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2021

Safe Multi-Agent Reinforcement Learning via Shielding

Multi-agent reinforcement learning (MARL) has been increasingly used in ...
research
09/08/2017

Prosocial learning agents solve generalized Stag Hunts better than selfish ones

Deep reinforcement learning has become an important paradigm for constru...
research
12/29/2019

Individual specialization in multi-task environments with multiagent reinforcement learners

There is a growing interest in Multi-Agent Reinforcement Learning (MARL)...
research
05/29/2018

Virtuously Safe Reinforcement Learning

We show that when a third party, the adversary, steps into the two-party...
research
12/29/2019

Loss aversion fosters coordination among independent reinforcement learners

We study what are the factors that can accelerate the emergence of colla...
research
08/07/2023

Asynchronous Decentralized Q-Learning: Two Timescale Analysis By Persistence

Non-stationarity is a fundamental challenge in multi-agent reinforcement...

Please sign up or login with your details

Forgot password? Click here to reset