3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

04/13/2022
by   Junyu Luo, et al.
1

3D visual grounding aims to locate the referred target object in 3D point cloud scenes according to a free-form language description. Previous methods mostly follow a two-stage paradigm, i.e., language-irrelevant detection and cross-modal matching, which is limited by the isolated architecture. In such a paradigm, the detector needs to sample keypoints from raw point clouds due to the inherent properties of 3D point clouds (irregular and large-scale), to generate the corresponding object proposal for each keypoint. However, sparse proposals may leave out the target in detection, while dense proposals may confuse the matching model. Moreover, the language-irrelevant detection stage can only sample a small proportion of keypoints on the target, deteriorating the target prediction. In this paper, we propose a 3D Single-Stage Referred Point Progressive Selection (3D-SPS) method, which progressively selects keypoints with the guidance of language and directly locates the target. Specifically, we propose a Description-aware Keypoint Sampling (DKS) module to coarsely focus on the points of language-relevant objects, which are significant clues for grounding. Besides, we devise a Target-oriented Progressive Mining (TPM) module to finely concentrate on the points of the target, which is enabled by progressive intra-modal relation modeling and inter-modal target mining. 3D-SPS bridges the gap between detection and matching in the 3D visual grounding task, localizing the target at a single stage. Experiments demonstrate that 3D-SPS achieves state-of-the-art performance on both ScanRefer and Nr3D/Sr3D datasets.

READ FULL TEXT

page 1

page 3

page 8

research
03/30/2021

Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud

3D object grounding aims to locate the most relevant target object in a ...
research
02/12/2019

You Only Look & Listen Once: Towards Fast and Accurate Visual Grounding

Visual Grounding (VG) aims to locate the most relevant region in an imag...
research
04/01/2020

Boundary-Aware Dense Feature Indicator for Single-Stage 3D Object Detection from Point Clouds

3D object detection based on point clouds has become more and more popul...
research
03/14/2021

Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images

Grounding referring expressions in RGBD image has been an emerging field...
research
03/10/2022

Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding

Recently, one-stage visual grounders attract high attention due to the c...
research
07/25/2023

3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding

3D visual grounding aims to localize the target object in a 3D point clo...
research
05/12/2021

VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching

The prevailing framework for matching multimodal inputs is based on a tw...

Please sign up or login with your details

Forgot password? Click here to reset