Learning Dexterous Manipulation from Suboptimal Experts

10/16/2020
by   Rae Jeong, et al.
0

Learning dexterous manipulation in high-dimensional state-action spaces is an important open challenge with exploration presenting a major bottleneck. Although in many cases the learning process could be guided by demonstrations or other suboptimal experts, current RL algorithms for continuous action spaces often fail to effectively utilize combinations of highly off-policy expert data and on-policy exploration data. As a solution, we introduce Relative Entropy Q-Learning (REQ), a simple policy iteration algorithm that combines ideas from successful offline and conventional RL algorithms. It represents the optimal policy via importance sampling from a learned prior and is well-suited to take advantage of mixed data distributions. We demonstrate experimentally that REQ outperforms several strong baselines on robotic manipulation tasks for which suboptimal experts are available. We show how suboptimal experts can be constructed effectively by composing simple waypoint tracking controllers, and we also show how learned primitives can be combined with waypoint controllers to obtain reference behaviors to bootstrap a complex manipulation task on a simulated bimanual robot with human-like hands. Finally, we show that REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations. Videos and further materials are available at sites.google.com/view/rlfse.

READ FULL TEXT

page 6

page 12

research
10/14/2018

Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost

Dexterous multi-fingered robotic hands can perform a wide range of manip...
research
10/28/2021

Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives

Despite the potential of reinforcement learning (RL) for building genera...
research
04/12/2022

When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?

Offline reinforcement learning (RL) algorithms can acquire effective pol...
research
09/21/2022

An Open Tele-Impedance Framework to Generate Large Datasets for Contact-Rich Tasks in Robotic Manipulation

Using large datasets in machine learning has led to outstanding results,...
research
10/26/2022

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

While combining imitation learning (IL) and reinforcement learning (RL) ...
research
06/17/2023

Active Policy Improvement from Multiple Black-box Oracles

Reinforcement learning (RL) has made significant strides in various comp...
research
11/13/2019

IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data

Learning from offline task demonstrations is a problem of great interest...

Please sign up or login with your details

Forgot password? Click here to reset