DeepAI AI Chat
Log In Sign Up

Achieving Sample-Efficient and Online-Training-Safe Deep Reinforcement Learning with Base Controllers

11/24/2020
by   Minjian Xin, et al.
0

Application of Deep Reinforcement Learning (DRL) algorithms in real-world robotic tasks faces many challenges. On the one hand, reward-shaping for complex tasks is difficult and may result in sub-optimal performances. On the other hand, a sparse-reward setting renders exploration inefficient, and exploration using physical robots is of high-cost and unsafe. In this paper we propose a method of learning challenging sparse-reward tasks utilizing existing controllers. Built upon Deep Deterministic Policy Gradients (DDPG), our algorithm incorporates the controllers into stages of exploration, Q-value estimation as well as policy update. Through experiments ranging from stacking blocks to cups, we present a straightforward way of synthesizing these controllers, and show that the learned state-based or image-based policies steadily outperform them. Compared to previous works of learning from demonstrations, our method improves sample efficiency by orders of magnitude and can learn online in a safe manner. Overall, our method bears the potential of leveraging existing industrial robot manipulation systems to build more flexible and intelligent controllers.

READ FULL TEXT
12/15/2018

Residual Policy Learning

We present Residual Policy Learning (RPL): a simple method for improving...
03/06/2023

Value Guided Exploration with Sub-optimal Controllers for Learning Dexterous Manipulation

Recently, reinforcement learning has allowed dexterous manipulation skil...
06/13/2019

Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards

Connector insertion and many other tasks commonly found in modern manufa...
01/11/2022

Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

During recent years, deep reinforcement learning (DRL) has made successf...
01/23/2021

Learning Setup Policies: Reliable Transition Between Locomotion Behaviours

Dynamic platforms that operate over manyunique terrain conditions typica...
07/21/2021

Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics

We present Bayesian Controller Fusion (BCF): a hybrid control strategy t...