How RL Agents Behave When Their Actions Are Modified

02/15/2021
by   Eric D. Langlois, et al.
0

Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions. As a result of supervisor intervention, the executed action may differ from the action specified by the policy. How does this affect learning? We present the Modified-Action Markov Decision Process, an extension of the MDP model that allows actions to differ from the policy. We analyze the asymptotic behaviours of common reinforcement learning algorithms in this setting and show that they adapt in different ways: some completely ignore modifications while others go to various lengths in trying to avoid action modifications that decrease reward. By choosing the right algorithm, developers can prevent their agents from learning to circumvent interruptions or constraints, and better control agent responses to other kinds of action modification, like self-damage.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2021

Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method

We discuss the problem of decentralized multi-agent reinforcement learni...
research
12/03/2015

Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions

Many real-world problems come with action spaces represented as feature ...
research
09/27/2020

Scalable Deep Reinforcement Learning for Ride-Hailing

Ride-hailing services, such as Didi Chuxing, Lyft, and Uber, arrange tho...
research
01/02/2022

Reinforcement Learning for Task Specifications with Action-Constraints

In this paper, we use concepts from supervisory control theory of discre...
research
10/02/2019

Formal Language Constraints for Markov Decision Processes

In order to satisfy safety conditions, a reinforcement learned (RL) agen...
research
09/24/2018

EpiRL: A Reinforcement Learning Agent to Facilitate Epistasis Detection

Epistasis (gene-gene interaction) is crucial to predicting genetic disea...
research
08/08/2019

Incremental Reinforcement Learning --- a New Continuous Reinforcement Learning Frame Based on Stochastic Differential Equation methods

Continuous reinforcement learning such as DDPG and A3C are widely used i...

Please sign up or login with your details

Forgot password? Click here to reset