ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes

12/12/2022
by   Ahmed Abdelreheem, et al.
0

The two popular datasets ScanRefer [16] and ReferIt3D [3] connect natural language to real-world 3D data. In this paper, we curate a large-scale and complementary dataset extending both the aforementioned ones by associating all objects mentioned in a referential sentence to their underlying instances inside a 3D scene. Specifically, our Scan Entities in 3D (ScanEnts3D) dataset provides explicit correspondences between 369k objects across 84k natural referential sentences, covering 705 real-world scenes. Crucially, we show that by incorporating intuitive losses that enable learning from this novel dataset, we can significantly improve the performance of several recently introduced neural listening architectures, including improving the SoTA in both the Nr3D and ScanRefer benchmarks by 4.3 experiment with competitive baselines and recent methods for the task of language generation and show that, as with neural listeners, 3D neural speakers can also noticeably benefit by training with ScanEnts3D, including improving the SoTA by 13.2 CIDEr points on the Nr3D benchmark. Overall, our carefully conducted experimental studies strongly support the conclusion that, by learning on ScanEnts3D, commonly used visio-linguistic 3D architectures can become more efficient and interpretable in their generalization without needing to provide these newly collected annotations at test time. The project's webpage is https://scanents3d.github.io/ .

READ FULL TEXT

page 1

page 7

page 9

research
05/08/2023

Video Object Segmentation in Panoptic Wild Scenes

In this paper, we introduce semi-supervised video object segmentation (V...
research
03/17/2022

Neural Part Priors: Learning to Optimize Part-Based Object Completion in RGB-D Scans

3D object recognition has seen significant advances in recent years, sho...
research
02/28/2017

SceneSeer: 3D Scene Design with Natural Language

Designing 3D scenes is currently a creative task that requires significa...
research
06/06/2022

Scan2Part: Fine-grained and Hierarchical Part-level Understanding of Real-World 3D Scans

We propose Scan2Part, a method to segment individual parts of objects in...
research
04/05/2022

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Objects play a crucial role in our everyday activities. Though multisens...
research
12/18/2019

ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

We introduce the new task of 3D object localization in RGB-D scans using...

Please sign up or login with your details

Forgot password? Click here to reset