INVIGORATE: Interactive Visual Grounding and Grasping in Clutter

08/25/2021
by   Hanbo Zhang, et al.
0

This paper presents INVIGORATE, a robot system that interacts with human through natural language and grasps a specified object in clutter. The objects may occlude, obstruct, or even stack on top of one another. INVIGORATE embodies several challenges: (i) infer the target object among other occluding objects, from input language expressions and RGB images, (ii) infer object blocking relationships (OBRs) from the images, and (iii) synthesize a multi-step plan to ask questions that disambiguate the target object and to grasp it successfully. We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping. They allow for unrestricted object categories and language expressions, subject to the training datasets. However, errors in visual perception and ambiguity in human languages are inevitable and negatively impact the robot's performance. To overcome these uncertainties, we build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules. Through approximate POMDP planning, the robot tracks the history of observations and asks disambiguation questions in order to achieve a near-optimal sequence of actions that identify and grasp the target object. INVIGORATE combines the benefits of model-based POMDP planning and data-driven deep learning. Preliminary experiments with INVIGORATE on a Fetch robot show significant benefits of this integrated approach to object grasping in clutter with natural language interactions. A demonstration video is available at https://youtu.be/zYakh80SGcU.

READ FULL TEXT

page 1

page 3

page 6

page 8

research
06/11/2018

Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction

This paper presents INGRESS, a robot system that follows human natural l...
research
03/15/2022

Interactive Robotic Grasping with Attribute-Guided Disambiguation

Interactive robotic grasping using natural language is one of the most f...
research
04/01/2021

A Joint Network for Grasp Detection Conditioned on Natural Language Commands

We consider the task of grasping a target object based on a natural lang...
research
02/15/2023

Online Tool Selection with Learned Grasp Prediction Models

Deep learning-based grasp prediction models have become an industry stan...
research
07/19/2021

Ab Initio Particle-based Object Manipulation

This paper presents Particle-based Object Manipulation (Prompt), a new a...
research
01/20/2022

DFBVS: Deep Feature-Based Visual Servo

Classical Visual Servoing (VS) rely on handcrafted visual features, whic...
research
03/14/2023

Prioritized Planning for Target-Oriented Manipulation via Hierarchical Stacking Relationship Prediction

In scenarios involving the grasping of multiple targets, the learning of...

Please sign up or login with your details

Forgot password? Click here to reset