Language-Conditioned Observation Models for Visual Object Search

09/13/2023
by   Thao Nguyen, et al.
0

Object search is a challenging task because when given complex language descriptions (e.g., "find the white cup on the table"), the robot must move its camera through the environment and recognize the described object. Previous works map language descriptions to a set of fixed object detectors with predetermined noise models, but these approaches are challenging to scale because new detectors need to be made for each object. In this work, we bridge the gap in realistic object search by posing the search problem as a partially observable Markov decision process (POMDP) where the object detector and visual sensor noise in the observation model is determined by a single Deep Neural Network conditioned on complex language descriptions. We incorporate the neural network's outputs into our language-conditioned observation model (LCOM) to represent dynamically changing sensor noise. With an LCOM, any language description of an object can be used to generate an appropriate object detector and noise model, and training an LCOM only requires readily available supervised image-caption datasets. We empirically evaluate our method by comparing against a state-of-the-art object search algorithm in simulation, and demonstrate that planning with our observation model yields a significantly higher average task completion rate (from 0.46 to 0.66) and more efficient and quicker object search than with a fixed-noise model. We demonstrate our method on a Boston Dynamics Spot robot, enabling it to handle complex natural language object descriptions and efficiently find objects in a room-scale environment.

READ FULL TEXT

page 1

page 5

page 6

research
12/04/2020

Spatial Language Understanding for Object Search in Partially Observed Cityscale Environments

We present a system that enables robots to interpret spatial language as...
research
05/06/2020

Multi-Resolution POMDP Planning for Multi-Object Search in 3D

Robots operating in household environments must find objects on shelves,...
research
06/23/2020

Robot Object Retrieval with Contextual Natural Language Queries

Natural language object retrieval is a highly useful yet challenging tas...
research
02/13/2023

Paparazzi: A Deep Dive into the Capabilities of Language and Vision Models for Grounding Viewpoint Descriptions

Existing language and vision models achieve impressive performance in im...
research
01/24/2023

Generalized Object Search

Future collaborative robots must be capable of finding objects. As such ...
research
10/19/2021

Towards Optimal Correlational Object Search

In realistic applications of object search, robots will need to locate t...
research
04/08/2021

Exploiting Natural Language for Efficient Risk-Aware Multi-robot SaR Planning

The ability to develop a high-level understanding of a scene, such as pe...

Please sign up or login with your details

Forgot password? Click here to reset