Vision-Based Manipulators Need to Also See from Their Hands

by   Kyle Hsu, et al.

We study how the choice of visual perspective affects learning and generalization in the context of physical manipulation from raw sensor observations. Compared with the more commonly used global third-person perspective, a hand-centric (eye-in-hand) perspective affords reduced observability, but we find that it consistently improves training efficiency and out-of-distribution generalization. These benefits hold across a variety of learning algorithms, experimental settings, and distribution shifts, and for both simulated and real robot apparatuses. However, this is only the case when hand-centric observability is sufficient; otherwise, including a third-person perspective is necessary for learning, but also harms out-of-distribution generalization. To mitigate this, we propose to regularize the third-person information stream via a variational information bottleneck. On six representative manipulation tasks with varying hand-centric observability adapted from the Meta-World benchmark, this results in a state-of-the-art reinforcement learning agent operating from both perspectives improving its out-of-distribution generalization on every task. While some practitioners have long put cameras in the hands of robots, our work systematically analyzes the benefits of doing so and provides simple and broadly applicable insights for improving end-to-end learned vision-based robotic manipulation.


page 7

page 14

page 15

page 19

page 20

page 24

page 26

page 29


Vision-based Robot Manipulation Learning via Human Demonstrations

Vision-based learning methods provide promise for robots to learn comple...

Learning Dexterous In-Hand Manipulation

We use reinforcement learning (RL) to learn dexterous in-hand manipulati...

CLIPort: What and Where Pathways for Robotic Manipulation

How can we imbue robots with the ability to manipulate objects precisely...

The Ingredients of Real-World Robotic Reinforcement Learning

The success of reinforcement learning for real world robotics has been, ...

Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration

In this paper, we propose a multi-task learning from demonstration metho...

Learning to See before Learning to Act: Visual Pre-training for Manipulation

Does having visual priors (e.g. the ability to detect objects) facilitat...

Please sign up or login with your details

Forgot password? Click here to reset