Robot Policy Learning from Demonstration Using Advantage Weighting and Early Termination

07/31/2022
by   Abdalkarim Mohtasib, et al.
1

Learning robotic tasks in the real world is still highly challenging and effective practical solutions remain to be found. Traditional methods used in this area are imitation learning and reinforcement learning, but they both have limitations when applied to real robots. Combining reinforcement learning with pre-collected demonstrations is a promising approach that can help in learning control policies to solve robotic tasks. In this paper, we propose an algorithm that uses novel techniques to leverage offline expert data using offline and online training to obtain faster convergence and improved performance. The proposed algorithm (AWET) weights the critic losses with a novel agent advantage weight to improve over the expert data. In addition, AWET makes use of an automatic early termination technique to stop and discard policy rollouts that are not similar to expert trajectories – to prevent drifting far from the expert data. In an ablation study, AWET showed improved and promising performance when compared to state-of-the-art baselines on four standard robotic tasks.

READ FULL TEXT

page 1

page 5

research
12/07/2022

Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

Deep reinforcement learning (DRL) provides a new way to generate robot c...
research
03/20/2023

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

In this paper, we address the following problem: Given an offline demons...
research
10/21/2021

Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information

End-to-end learning robotic manipulation with high data efficiency is on...
research
11/27/2020

Offline Learning from Demonstrations and Unlabeled Experience

Behavior cloning (BC) is often practical for robot learning because it a...
research
01/27/2023

Behaviour Discriminator: A Simple Data Filtering Method to Improve Offline Policy Learning

This paper studies the problem of learning a control policy without the ...
research
03/10/2020

SQUIRL: Robust and Efficient Learning from Video Demonstration of Long-Horizon Robotic Manipulation Tasks

Recent advances in deep reinforcement learning (RL) have demonstrated it...
research
06/09/2021

Offline Inverse Reinforcement Learning

The objective of offline RL is to learn optimal policies when a fixed ex...

Please sign up or login with your details

Forgot password? Click here to reset