Achieving Sample-Efficient and Online-Training-Safe Deep Reinforcement Learning with Base Controllers

by   Minjian Xin, et al.

Application of Deep Reinforcement Learning (DRL) algorithms in real-world robotic tasks faces many challenges. On the one hand, reward-shaping for complex tasks is difficult and may result in sub-optimal performances. On the other hand, a sparse-reward setting renders exploration inefficient, and exploration using physical robots is of high-cost and unsafe. In this paper we propose a method of learning challenging sparse-reward tasks utilizing existing controllers. Built upon Deep Deterministic Policy Gradients (DDPG), our algorithm incorporates the controllers into stages of exploration, Q-value estimation as well as policy update. Through experiments ranging from stacking blocks to cups, we present a straightforward way of synthesizing these controllers, and show that the learned state-based or image-based policies steadily outperform them. Compared to previous works of learning from demonstrations, our method improves sample efficiency by orders of magnitude and can learn online in a safe manner. Overall, our method bears the potential of leveraging existing industrial robot manipulation systems to build more flexible and intelligent controllers.



There are no comments yet.


page 4


Residual Policy Learning

We present Residual Policy Learning (RPL): a simple method for improving...

Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards

Connector insertion and many other tasks commonly found in modern manufa...

Exploration in Deep Reinforcement Learning: A Survey

This paper reviews exploration techniques in deep reinforcement learning...

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Dexterous multi-fingered hands are extremely versatile and provide a gen...

Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

During recent years, deep reinforcement learning (DRL) has made successf...

Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics

We present Bayesian Controller Fusion (BCF): a hybrid control strategy t...

Learning to Compose Hierarchical Object-Centric Controllers for Robotic Manipulation

Manipulation tasks can often be decomposed into multiple subtasks perfor...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.