RREx-BoT: Remote Referring Expressions with a Bag of Tricks

01/30/2023
by   Gunnar A. Sigurdsson, et al.
0

Household robots operate in the same space for years. Such robots incrementally build dynamic maps that can be used for tasks requiring remote object localization. However, benchmarks in robot learning often test generalization through inference on tasks in unobserved environments. In an observed environment, locating an object is reduced to choosing from among all object proposals in the environment, which may number in the 100,000s. Armed with this intuition, using only a generic vision-language scoring model with minor modifications for 3d encoding and operating in an embodied environment, we demonstrate an absolute performance gain of 9.84 above state of the art models for REVERIE and of 5.04 pre-explore an environment, we also exceed the previous state of the art pre-exploration method on REVERIE. Additionally, we demonstrate our model on a real-world TurtleBot platform, highlighting the simplicity and usefulness of the approach. Our analysis outlines a "bag of tricks" essential for accomplishing this task, from utilizing 3d coordinates and context, to generalizing vision-language models to large 3d search spaces.

READ FULL TEXT

page 1

page 3

page 4

page 7

research
04/23/2019

RERERE: Remote Embodied Referring Expressions in Real indoor Environments

One of the long-term challenges of robotics is to enable humans to commu...
research
03/30/2023

coExplore: Combining multiple rankings for multi-robot exploration

Multi-robot exploration is a field which tackles the challenge of explor...
research
04/03/2023

Navigating to Objects Specified by Images

Images are a convenient way to specify which particular object instance ...
research
03/02/2023

Open-World Object Manipulation using Pre-trained Vision-Language Models

For robots to follow instructions from people, they must be able to conn...
research
03/10/2023

Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors

In recent years, a number of models that learn the relations between vis...
research
06/30/2023

Statler: State-Maintaining Language Models for Embodied Reasoning

Large language models (LLMs) provide a promising tool that enable robots...
research
12/13/2021

Contact-Rich Manipulation of a Flexible Object based on Deep Predictive Learning using Vision and Tactility

We achieved contact-rich flexible object manipulation, which was difficu...

Please sign up or login with your details

Forgot password? Click here to reset