DeepAI
Log In Sign Up

Fighting Failures with FIRE: Failure Identification to Reduce Expert Burden in Intervention-Based Learning

07/01/2020
by   Trevor Ablett, et al.
0

Supervised imitation learning, also known as behavior cloning, suffers from distribution drift leading to failures during policy execution. One approach to mitigating this issue is to allow an expert to correct the agent's actions during task execution, based on the expert's determination that the agent has reached a `point of no return'. The agent's policy is then retraining using these new corrective data. This approach alone can allow high-performance agents to be learned, but at a high cost: the expert must vigilantly observe execution until the policy reaches a specified level of success, and even at that point, there is no guarantee that the policy will always succeed. To address these limitations, we present FIRE (Failure Identification to Reduce Expert burden), a system that can predict when a running policy will fail, halt its execution, and request a correction from the expert. Unlike existing approaches that learn only from expert data, our approach learns from both expert and non-expert data, akin to adversarial learning. We demonstrate experimentally for a series of challenging manipulation tasks that our method is able to recognize state-action pairs that lead to failures. This allows seamless integration into an intervention-based learning system, where we show an order-of-magnitude gain in sample efficiency compared with a state-of-the-art inverse reinforcement learning methods and drastically improved performance over an equivalent amount of data learned with behavior cloning.

READ FULL TEXT

page 1

page 5

03/11/2019

Hybrid Reinforcement Learning with Expert State Sequences

Existing imitation learning approaches often require that the complete d...
12/30/2022

Learning from Guided Play: Improving Exploration for Adversarial Imitation Learning with Simple Auxiliary Tasks

Adversarial imitation learning (AIL) has become a popular alternative to...
05/20/2021

Robot Action Diagnosis and Experience Correction by Falsifying Parameterised Execution Models

When faced with an execution failure, an intelligent robot should be abl...
02/25/2021

Off-Policy Imitation Learning from Observations

Learning from Observations (LfO) is a practical reinforcement learning s...
03/31/2021

LazyDAgger: Reducing Context Switching in Interactive Imitation Learning

Corrective interventions while a robot is learning to automate a task pr...
11/25/2020

Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation

A learning dialogue agent can infer its behaviour from interactions with...
03/22/2021

Introspective Visuomotor Control: Exploiting Uncertainty in Deep Visuomotor Control for Failure Recovery

End-to-end visuomotor control is emerging as a compelling solution for r...