Differentiable Parsing and Visual Grounding of Verbal Instructions for Object Placement

10/01/2022
by   Zirui Zhao, et al.
0

Grounding spatial relations in natural language for object placing could have ambiguity and compositionality issues. To address the issues, we introduce ParaGon, a PARsing And visual GrOuNding framework for language-conditioned object placement. It parses language instructions into relations between objects and grounds those objects in visual scenes. A particle-based GNN then conducts relational reasoning between grounded objects for placement generation. ParaGon encodes all of those procedures into neural networks for end-to-end training, which avoids annotating parsing and object reference grounding labels. Our approach inherently integrates parsing-based methods into a probabilistic, data-driven framework. It is data-efficient and generalizable for learning compositional instructions, robust to noisy language inputs, and adapts to the uncertainty of ambiguous instructions.

READ FULL TEXT

page 1

page 3

page 6

research
04/06/2023

Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach

We focus on the task of language-conditioned object placement, in which ...
research
03/31/2023

Grounding Object Relations in Language-Conditioned Robotic Manipulation with Semantic-Spatial Reasoning

Grounded understanding of natural language in physical scenes can greatl...
research
02/16/2021

Composing Pick-and-Place Tasks By Grounding Language

Controlling robots to perform tasks via natural language is one of the m...
research
11/17/2022

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding

Localizing objects in 3D scenes based on natural language requires under...
research
05/24/2022

Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity Resolution

Service robots should be able to interact naturally with non-expert huma...
research
12/08/2018

Explainability by Parsing: Neural Module Tree Networks for Natural Language Visual Grounding

Grounding natural language in images essentially requires composite visu...
research
09/18/2021

ReaSCAN: Compositional Reasoning in Language Grounding

The ability to compositionally map language to referents, relations, and...

Please sign up or login with your details

Forgot password? Click here to reset