DeepAI AI Chat
Log In Sign Up

Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations

by   Zhuangdi Zhu, et al.

Model-free deep reinforcement learning (RL) has demonstrated its superiority on many complex sequential decision-making problems. However, heavy dependence on dense rewards and high sample-complexity impedes the wide adoption of these methods in real-world scenarios. On the other hand, imitation learning (IL) learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations. In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive, and the quality of demonstrations typically limits the performance of the learning policy. In this work, we propose Self-Adaptive Imitation Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations for highly challenging sparse reward tasks. SAIL bridges the advantages of IL and RL to reduce the sample complexity substantially, by effectively exploiting sup-optimal demonstrations and efficiently exploring the environment to surpass the demonstrated performance. Extensive empirical results show that not only does SAIL significantly improve the sample-efficiency but also leads to much better final performance across different continuous control tasks, comparing to the state-of-the-art.


page 1

page 2

page 3

page 4


Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

Deep reinforcement learning (DRL) provides a new way to generate robot c...

Learning Skills to Patch Plans Based on Inaccurate Models

Planners using accurate models can be effective for accomplishing manipu...

Learning from Demonstration without Demonstrations

State-of-the-art reinforcement learning (RL) algorithms suffer from high...

PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav

We study ObjectGoal Navigation - where a virtual robot situated in a new...