VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud

03/25/2023
by   Ziqin Wang, et al.
0

The task of 3D semantic scene graph (3DSSG) prediction in the point cloud is challenging since (1) the 3D point cloud only captures geometric structures with limited semantics compared to 2D images, and (2) long-tailed relation distribution inherently hinders the learning of unbiased prediction. Since 2D images provide rich semantics and scene graphs are in nature coped with languages, in this study, we propose Visual-Linguistic Semantics Assisted Training (VL-SAT) scheme that can significantly empower 3DSSG prediction models with discrimination about long-tailed and ambiguous semantic relations. The key idea is to train a powerful multi-modal oracle model to assist the 3D model. This oracle learns reliable structural representations based on semantics from vision, language, and 3D geometry, and its benefits can be heterogeneously passed to the 3D model during the training stage. By effectively utilizing visual-linguistic semantics in training, our VL-SAT can significantly boost common 3DSSG prediction models, such as SGFN and SGGpoint, only with 3D inputs in the inference stage, especially when dealing with tail relation triplets. Comprehensive evaluations and ablation studies on the 3DSSG dataset have validated the effectiveness of the proposed scheme. Code is available at https://github.com/wz7in/CVPR2023-VLSAT.

READ FULL TEXT

page 4

page 7

page 12

research
12/05/2022

PointCaM: Cut-and-Mix for Open-Set Point Cloud Analysis

Point cloud analysis is receiving increasing attention, however, most ex...
research
01/18/2023

Joint Representation Learning for Text and 3D Point Cloud

Recent advancements in vision-language pre-training (e.g. CLIP) have sho...
research
02/06/2023

1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop

Panoptic Scene Graph (PSG) generation aims to generate scene graph repre...
research
12/09/2021

PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis

Learning intra-region contexts and inter-region relations are two effect...
research
04/17/2022

Learning 3D Semantics from Pose-Noisy 2D Images with Hierarchical Full Attention Network

We propose a novel framework to learn 3D point cloud semantics from 2D m...
research
03/20/2023

Revisiting Transformer for Point Cloud-based 3D Scene Graph Generation

In this paper, we propose the semantic graph Transformer (SGT) for the 3...

Please sign up or login with your details

Forgot password? Click here to reset