"I'm sorry Dave, I'm afraid I can't do that" Deep Q-learning from forbidden action

10/04/2019
by   Mathieu Seurin, et al.
0

The use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed with safety constraints in mind implemented in the shape of valid actions masks or contingency controllers. For example, the range of motion and the angles of the motors of a robot can be limited to physical boundaries. Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes. In this paper, we propose a simple modification of a state-of-the-art deep RL algorithm (DQN), enabling learning from forbidden actions. To do so, the standard Q-learning update is enhanced with an extra safety loss inspired by structured classification. We empirically show that it reduces the number of hit constraints during the learning phase and accelerates convergence to near-optimal policies compared to using standard DQN. Experiments are done on a Visual Grid World Environment and Text-World domain.

READ FULL TEXT

page 4

page 10

research
04/30/2023

Joint Learning of Policy with Unknown Temporal Constraints for Safe Reinforcement Learning

In many real-world applications, safety constraints for reinforcement le...
research
04/15/2022

Safe Reinforcement Learning Using Black-Box Reachability Analysis

Reinforcement learning (RL) is capable of sophisticated motion planning ...
research
02/08/2023

A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints

In many applications of Reinforcement Learning (RL), it is critically im...
research
03/06/2023

Reducing Safety Interventions in Provably Safe Reinforcement Learning

Deep Reinforcement Learning (RL) has shown promise in addressing complex...
research
09/16/2022

Optimizing Industrial HVAC Systems with Hierarchical Reinforcement Learning

Reinforcement learning (RL) techniques have been developed to optimize i...
research
05/12/2022

Contingency-constrained economic dispatch with safe reinforcement learning

Future power systems will rely heavily on micro grids with a high share ...
research
09/24/2018

Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning

In the real world, agents often have to operate in situations with incom...

Please sign up or login with your details

Forgot password? Click here to reset