ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

12/18/2019
by   Dave Zhenyu Chen, et al.
23

We introduce the new task of 3D object localization in RGB-D scans using natural language descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free-form description of a specified target object. To address this task, we propose ScanRefer, where the core idea is to learn a fused descriptor from 3D object proposals and encoded sentence embeddings. This learned descriptor then correlates the language expressions with the underlying geometric features of the 3D scan and facilitates the regression of the 3D bounding box of the target object. In order to train and benchmark our method, we introduce a new ScanRefer dataset, containing 46,173 descriptions of 9,943 objects from 703 ScanNet scenes. ScanRefer is the first large-scale effort to perform object localization via natural language expression directly in 3D.

READ FULL TEXT

page 1

page 3

page 5

page 7

page 8

page 16

page 17

page 18

research
12/20/2021

ScanQA: 3D Question Answering for Spatial Scene Understanding

We propose a new 3D spatial understanding task of 3D Question Answering ...
research
09/11/2023

Multi3DRefer: Grounding Text Description to Multiple 3D Objects

We introduce the task of localizing a flexible number of objects in real...
research
06/23/2020

Robot Object Retrieval with Contextual Natural Language Queries

Natural language object retrieval is a highly useful yet challenging tas...
research
08/16/2019

RIO: 3D Object Instance Re-Localization in Changing Indoor Environments

In this work, we introduce the task of 3D object instance re-localizatio...
research
05/26/2018

Using Syntax to Ground Referring Expressions in Natural Images

We introduce GroundNet, a neural network for referring expression recogn...
research
03/28/2022

Text2Pos: Text-to-Point-Cloud Cross-Modal Localization

Natural language-based communication with mobile devices and home applia...
research
12/12/2022

ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes

The two popular datasets ScanRefer [16] and ReferIt3D [3] connect natura...

Please sign up or login with your details

Forgot password? Click here to reset