DeepAI AI Chat
Log In Sign Up

Achieving Sample-Efficient and Online-Training-Safe Deep Reinforcement Learning with Base Controllers

by   Minjian Xin, et al.

Application of Deep Reinforcement Learning (DRL) algorithms in real-world robotic tasks faces many challenges. On the one hand, reward-shaping for complex tasks is difficult and may result in sub-optimal performances. On the other hand, a sparse-reward setting renders exploration inefficient, and exploration using physical robots is of high-cost and unsafe. In this paper we propose a method of learning challenging sparse-reward tasks utilizing existing controllers. Built upon Deep Deterministic Policy Gradients (DDPG), our algorithm incorporates the controllers into stages of exploration, Q-value estimation as well as policy update. Through experiments ranging from stacking blocks to cups, we present a straightforward way of synthesizing these controllers, and show that the learned state-based or image-based policies steadily outperform them. Compared to previous works of learning from demonstrations, our method improves sample efficiency by orders of magnitude and can learn online in a safe manner. Overall, our method bears the potential of leveraging existing industrial robot manipulation systems to build more flexible and intelligent controllers.


Residual Policy Learning

We present Residual Policy Learning (RPL): a simple method for improving...

Value Guided Exploration with Sub-optimal Controllers for Learning Dexterous Manipulation

Recently, reinforcement learning has allowed dexterous manipulation skil...

Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards

Connector insertion and many other tasks commonly found in modern manufa...

Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

During recent years, deep reinforcement learning (DRL) has made successf...

Learning Setup Policies: Reliable Transition Between Locomotion Behaviours

Dynamic platforms that operate over manyunique terrain conditions typica...

Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics

We present Bayesian Controller Fusion (BCF): a hybrid control strategy t...