Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots

06/09/2023
by   Jiange Yang, et al.
1

Improving the generalization capabilities of general-purpose robotic agents has long been a significant challenge actively pursued by research communities. Existing approaches often rely on collecting large-scale real-world robotic data, such as the RT-1 dataset. However, these approaches typically suffer from low efficiency, limiting their capability in open-domain scenarios with new objects, and diverse backgrounds. In this paper, we propose a novel paradigm that effectively leverages language-grounded segmentation masks generated by state-of-the-art foundation models, to address a wide range of pick-and-place robot manipulation tasks in everyday scenarios. By integrating precise semantics and geometries conveyed from masks into our multi-view policy model, our approach can perceive accurate object poses and enable sample-efficient learning. Besides, such design facilitates effective generalization for grasping new objects with similar shapes observed during training. Our approach consists of two distinct steps. First, we introduce a series of foundation models to accurately ground natural language demands across multiple tasks. Second, we develop a Multi-modal Multi-view Policy Model that incorporates inputs such as RGB images, semantic masks, and robot proprioception states to jointly predict precise and executable robot actions. Extensive real-world experiments conducted on a Franka Emika robot arm validate the effectiveness of our proposed paradigm. Real-world demos are shown in YouTube (https://www.youtube.com/watch?v=1m9wNzfp_4E ) and Bilibili (https://www.bilibili.com/video/BV178411Z7H2/ ).

READ FULL TEXT

page 2

page 4

page 6

research
02/05/2023

Multi-View Masked World Models for Visual Robotic Manipulation

Visual robotic manipulation research and applications often use multiple...
research
09/18/2023

Grasp-Anything: Large-scale Grasp Dataset from Foundation Models

Foundation models such as ChatGPT have made significant strides in robot...
research
05/18/2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

Foundation models have made significant strides in various applications,...
research
02/14/2023

ConceptFusion: Open-set Multimodal 3D Mapping

Building 3D maps of the environment is central to robot navigation, plan...
research
07/24/2023

simPLE: a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects

Existing robotic systems have a clear tension between generality and pre...
research
10/24/2019

RoboNet: Large-Scale Multi-Robot Learning

Robot learning has emerged as a promising tool for taming the complexity...
research
08/07/2023

Foundation Model based Open Vocabulary Task Planning and Executive System for General Purpose Service Robots

This paper describes a strategy for implementing a robotic system capabl...

Please sign up or login with your details

Forgot password? Click here to reset