Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning

11/18/2017
by   Benjamin Eysenbach, et al.
0

Deep reinforcement learning algorithms can learn complex behavioral skills, but real-world application of these methods requires a large amount of experience to be collected by the agent. In practical settings, such as robotics, this involves repeatedly attempting a task, resetting the environment between each attempt. However, not all tasks are easily or automatically reversible. In practice, this learning process requires extensive human intervention. In this work, we propose an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt. By learning a value function for the reset policy, we can automatically determine when the forward policy is about to enter a non-reversible state, providing for uncertainty-aware safety aborts. Our experiments illustrate that proper use of the reset policy can greatly reduce the number of manual resets required to learn a task, can reduce the number of unsafe actions that lead to non-reversible states, and can automatically induce a curriculum.

READ FULL TEXT
research
04/05/2022

Automating Reinforcement Learning with Example-based Resets

Deep reinforcement learning has enabled robots to learn motor skills fro...
research
07/28/2021

Fully Autonomous Real-World Reinforcement Learning for Mobile Manipulation

We study how robots can autonomously learn skills that require a combina...
research
08/25/2023

Towards Optimal Head-to-head Autonomous Racing with Curriculum Reinforcement Learning

Head-to-head autonomous racing is a challenging problem, as the vehicle ...
research
02/20/2023

Safe Deep Reinforcement Learning by Verifying Task-Level Properties

Cost functions are commonly employed in Safe Deep Reinforcement Learning...
research
04/07/2021

Improving Robustness of Deep Reinforcement Learning Agents: Environment Attacks based on Critic Networks

To improve policy robustness of deep reinforcement learning agents, a li...
research
05/20/2018

Safe Policy Learning from Observations

In this paper, we consider the problem of learning a policy by observing...
research
10/29/2020

Causal variables from reinforcement learning using generalized Bellman equations

Many open problems in machine learning are intrinsically related to caus...

Please sign up or login with your details

Forgot password? Click here to reset