Stable In-hand Manipulation with Finger Specific Multi-agent Shadow Reward

by   Lingfeng Tao, et al.

Deep Reinforcement Learning has shown its capability to solve the high degrees of freedom in control and the complex interaction with the object in the multi-finger dexterous in-hand manipulation tasks. Current DRL approaches prefer sparse rewards to dense rewards for the ease of training but lack behavior constraints during the manipulation process, leading to aggressive and unstable policies that are insufficient for safety-critical in-hand manipulation tasks. Dense rewards can regulate the policy to learn stable manipulation behaviors with continuous reward constraints but are hard to empirically define and slow to converge optimally. This work proposes the Finger-specific Multi-agent Shadow Reward (FMSR) method to determine the stable manipulation constraints in the form of dense reward based on the state-action occupancy measure, a general utility of DRL that is approximated during the learning process. Information Sharing (IS) across neighboring agents enables consensus training to accelerate the convergence. The methods are evaluated in two in-hand manipulation tasks on the Shadow Hand. The results show FMSR+IS converges faster in training, achieving a higher task success rate and better manipulation stability than conventional dense reward. The comparison indicates FMSR+IS achieves a comparable success rate even with the behavior constraint but much better manipulation stability than the policy trained with a sparse reward.


page 1

page 5


A Multi-Agent Approach for Adaptive Finger Cooperation in Learning-based In-Hand Manipulation

In-hand manipulation is challenging for a multi-finger robotic hand due ...

A Study on Dense and Sparse (Visual) Rewards in Robot Policy Learning

Deep Reinforcement Learning (DRL) is a promising approach for teaching r...

Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty

Efficient and effective learning is one of the ultimate goals of the dee...

Learning Dense Reward with Temporal Variant Self-Supervision

Rewards play an essential role in reinforcement learning. In contrast to...

One-shot Policy Elicitation via Semantic Reward Manipulation

Synchronizing expectations and knowledge about the state of the world is...

Influencing Towards Stable Multi-Agent Interactions

Learning in multi-agent environments is difficult due to the non-station...

Switching Pushing Skill Combined MPC and Deep Reinforcement Learning for Planar Non-prehensile Manipulation

In this paper, a novel switching pushing skill algorithm is proposed to ...

Please sign up or login with your details

Forgot password? Click here to reset