Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping

09/14/2023
by   Adam Rashid, et al.
0

Grasping objects by a specific part is often crucial for safety and for executing downstream tasks. Yet, learning-based grasp planners lack this behavior unless they are trained on specific object part data, making it a significant challenge to scale object diversity. Instead, we propose LERF-TOGO, Language Embedded Radiance Fields for Task-Oriented Grasping of Objects, which uses vision-language models zero-shot to output a grasp distribution over an object given a natural language query. To accomplish this, we first reconstruct a LERF of the scene, which distills CLIP embeddings into a multi-scale 3D language field queryable with text. However, LERF has no sense of objectness, meaning its relevancy outputs often return incomplete activations over an object which are insufficient for subsequent part queries. LERF-TOGO mitigates this lack of spatial grouping by extracting a 3D object mask via DINO features and then conditionally querying LERF on this mask to obtain a semantic distribution over the object with which to rank grasps from an off-the-shelf grasp planner. We evaluate LERF-TOGO's ability to grasp task-oriented object parts on 31 different physical objects, and find it selects grasps on the correct part in 81 project website at: lerftogo.github.io

READ FULL TEXT

page 1

page 3

page 6

page 18

page 19

page 22

page 23

page 24

research
06/01/2022

Multi-Object Grasping in the Plane

We consider the problem where multiple rigid convex polygonal objects re...
research
04/01/2021

A Joint Network for Grasp Detection Conditioned on Natural Language Commands

We consider the task of grasping a target object based on a natural lang...
research
03/16/2023

LERF: Language Embedded Radiance Fields

Humans describe the physical world using natural language to refer to sp...
research
07/10/2019

Towards Affordance Prediction with Vision via Task Oriented Grasp Quality Metrics

While many quality metrics exist to evaluate the quality of a grasp by i...
research
06/09/2022

Extracting Zero-shot Common Sense from Large Language Models for Robot 3D Scene Understanding

Semantic 3D scene understanding is a problem of critical importance in r...
research
03/17/2023

Remote Task-oriented Grasp Area Teaching By Non-Experts through Interactive Segmentation and Few-Shot Learning

A robot operating in unstructured environments must be able to discrimin...
research
08/01/2019

Automatic pre-grasps generation for unknown 3D objects

In this paper, the problem of automating the pre-grasps generation for n...

Please sign up or login with your details

Forgot password? Click here to reset