Multi-View Masked World Models for Visual Robotic Manipulation

02/05/2023
by   Younggyo Seo, et al.
0

Visual robotic manipulation research and applications often use multiple cameras, or views, to better perceive the world. How else can we utilize the richness of multi-view data? In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation. Specifically, we train a multi-view masked autoencoder which reconstructs pixels of randomly masked viewpoints and then learn a world model operating on the representations from the autoencoder. We demonstrate the effectiveness of our method in a range of scenarios, including multi-view control and single-view control with auxiliary cameras for representation learning. We also show that the multi-view masked autoencoder trained with multiple randomized viewpoints enables training a policy with strong viewpoint randomization and transferring the policy to solve real-robot tasks without camera calibration and an adaptation procedure. Videos demonstrations in real-world experiments and source code are available at the project website: https://sites.google.com/view/mv-mwm.

READ FULL TEXT

page 2

page 3

page 6

page 7

page 8

page 9

page 14

research
06/26/2023

RVT: Robotic View Transformer for 3D Object Manipulation

For 3D object manipulation, methods that build an explicit 3D representa...
research
02/21/2020

Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras

In this work, we present an effective multi-view approach to closed-loop...
research
03/13/2023

Visual-Policy Learning through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks

The use of multi-camera views simultaneously has been shown to improve t...
research
07/07/2023

Polybot: Training One Policy Across Robots While Embracing Variability

Reusing large datasets is crucial to scale vision-based robotic manipula...
research
06/09/2023

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots

Improving the generalization capabilities of general-purpose robotic age...
research
04/25/2023

MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

Manipulation relationship detection (MRD) aims to guide the robot to gra...
research
06/30/2023

Act3D: Infinite Resolution Action Detection Transformer for Robotic Manipulation

3D perceptual representations are well suited for robot manipulation as ...

Please sign up or login with your details

Forgot password? Click here to reset