Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction

07/18/2017
by   Mohit Shridhar, et al.
0

The human language is one of the most natural interfaces for humans to interact with robots. This paper presents a robot system that retrieves everyday objects with unconstrained natural language descriptions. A core issue for the system is semantic and spatial grounding, which is to infer objects and their spatial relationships from images and natural language expressions. We introduce a two-stage neural-network grounding pipeline that maps natural language referring expressions directly to objects in the images. The first stage uses visual descriptions in the referring expressions to generate a candidate set of relevant objects. The second stage examines all pairwise relationships between the candidates and predicts the most likely referred object according to the spatial descriptions in the referring expressions. A key feature of our system is that by leveraging a large dataset of images labeled with text descriptions, it allows unrestricted object types and natural language referring expressions. Preliminary results indicate that our system outperforms a near state-of-the-art object comprehension system on standard benchmark datasets. We also present a robot system that follows voice commands to pick and place previously unseen objects.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 10

research
06/11/2018

Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction

This paper presents INGRESS, a robot system that follows human natural l...
research
05/30/2019

Grounding Language Attributes to Objects using Bayesian Eigenobjects

We develop a system to disambiguate objects based on simple physical des...
research
04/15/2019

Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments

Referring to objects in a natural and unambiguous manner is crucial for ...
research
04/23/2019

RERERE: Remote Embodied Referring Expressions in Real indoor Environments

One of the long-term challenges of robotics is to enable humans to commu...
research
07/26/2021

Language Grounding with 3D Objects

Seemingly simple natural language requests to a robot are generally unde...
research
09/08/2016

Learning Lexical Entries for Robotic Commands using Crowdsourcing

Robotic commands in natural language usually contain various spatial des...
research
11/08/2022

Detecting Euphemisms with Literal Descriptions and Visual Imagery

This paper describes our two-stage system for the Euphemism Detection sh...

Please sign up or login with your details

Forgot password? Click here to reset