Safe Deep RL in 3D Environments using Human Feedback

01/20/2022
by   Matthew Rahtz, et al.
5

Agents should avoid unsafe behaviour during both training and deployment. This typically requires a simulator and a procedural specification of unsafe behaviour. Unfortunately, a simulator is not always available, and procedurally specifying constraints can be difficult or impossible for many real-world tasks. A recently introduced technique, ReQueST, aims to solve this problem by learning a neural simulator of the environment from safe human trajectories, then using the learned simulator to efficiently learn a reward model from human feedback. However, it is yet unknown whether this approach is feasible in complex 3D environments with feedback obtained from real humans - whether sufficient pixel-based neural simulator quality can be achieved, and whether the human data requirements are viable in terms of both quantity and quality. In this paper we answer this question in the affirmative, using ReQueST to train an agent to perform a 3D first-person object collection task using data entirely from human contractors. We show that the resulting agent exhibits an order of magnitude reduction in unsafe behaviour compared to standard reinforcement learning.

READ FULL TEXT

page 31

page 32

page 37

page 38

page 39

page 40

page 41

page 42

research
05/26/2023

Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

We propose a method to capture the handling abilities of fast jet pilots...
research
11/21/2022

Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback

An important goal in artificial intelligence is to create agents that ca...
research
11/17/2020

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

How can we design agents that pursue a given objective when all feedback...
research
04/24/2021

Constraint-Guided Reinforcement Learning: Augmenting the Agent-Environment-Interaction

Reinforcement Learning (RL) agents have great successes in solving tasks...
research
07/26/2023

Reinforcement Learning by Guided Safe Exploration

Safety is critical to broadening the application of reinforcement learni...
research
09/28/2017

Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

While recent advances in deep reinforcement learning have allowed autono...
research
10/21/2020

Influence-Augmented Online Planning for Complex Environments

How can we plan efficiently in real time to control an agent in a comple...

Please sign up or login with your details

Forgot password? Click here to reset