Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud

03/30/2021
by   Mingtao Feng, et al.
0

3D object grounding aims to locate the most relevant target object in a raw point cloud scene based on a free-form language description. Understanding complex and diverse descriptions, and lifting them directly to a point cloud is a new and challenging topic due to the irregular and sparse nature of point clouds. There are three main challenges in 3D object grounding: to find the main focus in the complex and diverse description; to understand the point cloud scene; and to locate the target object. In this paper, we address all three challenges. Firstly, we propose a language scene graph module to capture the rich structure and long-distance phrase correlations. Secondly, we introduce a multi-level 3D proposal relation graph module to extract the object-object and object-scene co-occurrence relationships, and strengthen the visual features of the initial proposals. Lastly, we develop a description guided 3D visual graph module to encode global contexts of phrases and proposals by a nodes matching strategy. Extensive experiments on challenging benchmark datasets (ScanRefer and Nr3D) show that our algorithm outperforms existing state-of-the-art. Our code is available at https://github.com/PNXD/FFL-3DOG.

READ FULL TEXT

page 4

page 8

research
04/13/2022

3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

3D visual grounding aims to locate the referred target object in 3D poin...
research
09/29/2022

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual and Language Learning

3D visual grounding aims to find the objects within point clouds mention...
research
03/01/2021

InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring

Compared with the visual grounding in 2D images, the natural-language-gu...
research
11/20/2019

Learning Cross-modal Context Graph for Visual Grounding

Visual grounding is a ubiquitous building block in many vision-language ...
research
07/25/2023

3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding

3D visual grounding aims to localize the target object in a 3D point clo...
research
08/23/2023

A Unified Framework for 3D Point Cloud Visual Grounding

3D point cloud visual grounding plays a critical role in 3D scene compre...
research
11/25/2022

Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding

The 3D visual grounding task has been explored with visual and language ...

Please sign up or login with your details

Forgot password? Click here to reset