Multi3DRefer: Grounding Text Description to Multiple 3D Objects

09/11/2023
by   Yiming Zhang, et al.
0

We introduce the task of localizing a flexible number of objects in real-world 3D scenes using natural language descriptions. Existing 3D visual grounding tasks focus on localizing a unique object given a text description. However, such a strict setting is unnatural as localizing potentially multiple objects is a common need in real-world scenarios and robotic tasks (e.g., visual navigation and object rearrangement). To address this setting we propose Multi3DRefer, generalizing the ScanRefer dataset and task. Our dataset contains 61926 descriptions of 11609 objects, where zero, single or multiple target objects are referenced by each description. We also introduce a new evaluation metric and benchmark methods from prior work to enable further investigation of multi-modal 3D scene understanding. Furthermore, we develop a better baseline leveraging 2D features from CLIP by rendering object proposals online with contrastive learning, which outperforms the state of the art on the ScanRefer benchmark.

READ FULL TEXT

page 1

page 2

page 4

page 5

page 8

page 13

page 15

page 16

research
05/23/2015

Text to 3D Scene Generation with Rich Lexical Grounding

The ability to map descriptions of scenes to 3D geometric representation...
research
05/23/2023

Cross3DVG: Baseline and Dataset for Cross-Dataset 3D Visual Grounding on Different RGB-D Scans

We present Cross3DVG, a novel task for cross-dataset visual grounding in...
research
03/29/2023

EgoTV: Egocentric Task Verification from Natural Language Task Descriptions

To enable progress towards egocentric agents capable of understanding ev...
research
07/11/2018

Towards Understanding End-of-trip Instructions in a Taxi Ride Scenario

We introduce a dataset containing human-authored descriptions of target ...
research
12/18/2019

ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

We introduce the new task of 3D object localization in RGB-D scans using...
research
03/24/2023

OPDMulti: Openable Part Detection for Multiple Objects

Openable part detection is the task of detecting the openable parts of a...
research
07/24/2023

Exposing the Troublemakers in Described Object Detection

Detecting objects based on language descriptions is a popular task that ...

Please sign up or login with your details

Forgot password? Click here to reset