Learning a Universal Human Prior for Dexterous Manipulation from Human Preference

04/10/2023
by   Zihan Ding, et al.
0

Generating human-like behavior on robots is a great challenge especially in dexterous manipulation tasks with robotic hands. Even in simulation with no sample constraints, scripting controllers is intractable due to high degrees of freedom, and manual reward engineering can also be hard and lead to non-realistic motions. Leveraging the recent progress on Reinforcement Learning from Human Feedback (RLHF), we propose a framework to learn a universal human prior using direct human preference feedback over videos, for efficiently tuning the RL policy on 20 dual-hand robot manipulation tasks in simulation, without a single human demonstration. One task-agnostic reward model is trained through iteratively generating diverse polices and collecting human preference over the trajectories; it is then applied for regularizing the behavior of polices in the fine-tuning stage. Our method empirically demonstrates more human-like behaviors on robot hands in diverse tasks including even unseen tasks, indicating its generalization capability.

READ FULL TEXT

page 4

page 6

page 7

page 9

page 16

page 20

page 22

research
05/29/2023

How to Query Human Feedback Efficiently in RL?

Reinforcement Learning with Human Feedback (RLHF) is a paradigm in which...
research
05/24/2022

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

Conveying complex objectives to reinforcement learning (RL) agents often...
research
12/06/2022

Few-Shot Preference Learning for Human-in-the-Loop RL

While reinforcement learning (RL) has become a more popular approach for...
research
07/25/2020

Human Preference Scaling with Demonstrations For Deep Reinforcement Learning

The current reward learning from human preferences could be used for res...
research
08/06/2020

Learning Context-Adaptive Task Constraints for Robotic Manipulation

Constraint-based control approaches offer a flexible way to specify robo...
research
06/26/2013

Learning Trajectory Preferences for Manipulators via Iterative Improvement

We consider the problem of learning good trajectories for manipulation t...
research
03/23/2023

Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

To facilitate research in the direction of fine-tuning foundation models...

Please sign up or login with your details

Forgot password? Click here to reset