Text2Pos: Text-to-Point-Cloud Cross-Modal Localization

03/28/2022
by   Manuel Kolmet, et al.
0

Natural language-based communication with mobile devices and home appliances is becoming increasingly popular and has the potential to become natural for communicating with mobile robots in the future. Towards this goal, we investigate cross-modal text-to-point-cloud localization that will allow us to specify, for example, a vehicle pick-up or goods delivery location. In particular, we propose Text2Pos, a cross-modal localization module that learns to align textual descriptions with localization cues in a coarse- to-fine manner. Given a point cloud of the environment, Text2Pos locates a position that is specified via a natural language-based description of the immediate surroundings. To train Text2Pos and study its performance, we construct KITTI360Pose, the first dataset for this task based on the recently introduced KITTI360 dataset. Our experiments show that we can localize 65 queries within 15m distance to query locations for top-10 retrieved locations. This is a starting point that we hope will spark future developments towards language-based navigation.

READ FULL TEXT

page 1

page 4

page 5

page 11

page 12

research
01/13/2023

Text to Point Cloud Localization with Relation-Enhanced Transformer

Automatically localizing a position based on a few natural language inst...
research
05/05/2021

Audio Retrieval with Natural Language Queries

We consider the task of retrieving audio using free-form natural languag...
research
09/12/2020

RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

We study an important, yet largely unexplored problem of large-scale cro...
research
08/01/2022

CSDN: Cross-modal Shape-transfer Dual-refinement Network for Point Cloud Completion

How will you repair a physical object with some missings? You may imagin...
research
04/04/2019

ExCL: Extractive Clip Localization Using Natural Language Descriptions

The task of retrieving clips within videos based on a given natural lang...
research
05/05/2017

TALL: Temporal Activity Localization via Language Query

This paper focuses on temporal localization of actions in untrimmed vide...
research
12/18/2019

ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

We introduce the new task of 3D object localization in RGB-D scans using...

Please sign up or login with your details

Forgot password? Click here to reset