A Unified Framework for 3D Point Cloud Visual Grounding

08/23/2023
by   Haojia Lin, et al.
0

3D point cloud visual grounding plays a critical role in 3D scene comprehension, encompassing 3D referring expression comprehension (3DREC) and segmentation (3DRES). We argue that 3DREC and 3DRES should be unified in one framework, which is also a natural progression in the community. To explain, 3DREC can help 3DRES locate the referent, while 3DRES can also facilitate 3DREC via more finegrained language-visual alignment. To achieve this, this paper takes the initiative step to integrate 3DREC and 3DRES into a unified framework, termed 3D Referring Transformer (3DRefTR). Its key idea is to build upon a mature 3DREC model and leverage ready query embeddings and visual tokens from the 3DREC model to construct a dedicated mask branch. Specially, we propose Superpoint Mask Branch, which serves a dual purpose: i) By leveraging the heterogeneous CPU-GPU parallelism, while the GPU is occupied generating visual tokens, the CPU concurrently produces superpoints, equivalently accomplishing the upsampling computation; ii) By harnessing on the inherent association between the superpoints and point cloud, it eliminates the heavy computational overhead on the high-resolution visual features for upsampling. This elegant design enables 3DRefTR to achieve both well-performing 3DRES and 3DREC capacities with only a 6 3DREC model. Empirical evaluations affirm the superiority of 3DRefTR. Specifically, on the ScanRefer dataset, 3DRefTR surpasses the state-of-the-art 3DRES method by 12.43 Acc@0.25IoU.

READ FULL TEXT
research
12/25/2022

Neural Shape Compiler: A Unified Framework for Transforming between Text, Point Cloud, and Program

3D shapes have complementary abstractions from low-level geometry to par...
research
03/30/2021

Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud

3D object grounding aims to locate the most relevant target object in a ...
research
11/26/2022

Meta Architecure for Point Cloud Analysis

Recent advances in 3D point cloud analysis bring a diverse set of networ...
research
03/30/2022

SeqTR: A Simple yet Universal Network for Visual Grounding

In this paper, we propose a simple yet universal network termed SeqTR fo...
research
06/06/2021

Referring Transformer: A One-step Approach to Multi-task Visual Grounding

As an important step towards visual reasoning, visual grounding (e.g., p...
research
08/01/2019

A Unified Point-Based Framework for 3D Segmentation

3D point cloud segmentation remains challenging for structureless and te...

Please sign up or login with your details

Forgot password? Click here to reset