Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

07/17/2017
by   William Saunders, et al.
0

AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven't yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human "in the loop" and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human's intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent's learning (whereas an RL baseline fails due to catastrophic forgetting). However, this scheme is less successful when catastrophes are more complex: it reduces but does not eliminate catastrophes and the supervised learner fails on adversarial examples found by the agent. Extrapolating to more challenging environments, we show that our implementation would not scale (due to the infeasible amount of human labor required). We outline extensions of the scheme that are necessary if we are to train model-free agents without a single catastrophe.

READ FULL TEXT

page 4

page 6

research
03/22/2019

Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention

Recent progress in AI and Reinforcement learning has shown great success...
research
06/25/2019

On Multi-Agent Learning in Team Sports Games

In recent years, reinforcement learning has been successful in solving v...
research
03/01/2021

Learning Monopoly Gameplay: A Hybrid Model-Free Deep Reinforcement Learning and Imitation Learning Approach

Learning how to adapt and make real-time informed decisions in dynamic a...
research
03/01/2019

Model-Based Reinforcement Learning for Atari

Model-free reinforcement learning (RL) can be used to learn effective po...
research
02/17/2022

Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization

Human intervention is an effective way to inject human knowledge into th...
research
08/30/2020

Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems

Recent successes combine reinforcement learning algorithms and deep neur...
research
11/14/2021

Free Will Belief as a consequence of Model-based Reinforcement Learning

The debate on whether or not humans have free will has been raging for c...

Please sign up or login with your details

Forgot password? Click here to reset