Learning 3D Semantics from Pose-Noisy 2D Images with Hierarchical Full Attention Network

04/17/2022
by   Yuhang He, et al.
0

We propose a novel framework to learn 3D point cloud semantics from 2D multi-view image observations containing pose error. On the one hand, directly learning from the massive, unstructured and unordered 3D point cloud is computationally and algorithmically more difficult than learning from compactly-organized and context-rich 2D RGB images. On the other hand, both LiDAR point cloud and RGB images are captured in standard automated-driving datasets. This motivates us to conduct a "task transfer" paradigm so that 3D semantic segmentation benefits from aggregating 2D semantic cues, albeit pose noises are contained in 2D image observations. Among all difficulties, pose noise and erroneous prediction from 2D semantic segmentation approaches are the main challenges for the task transfer. To alleviate the influence of those factor, we perceive each 3D point using multi-view images and for each single image a patch observation is associated. Moreover, the semantic labels of a block of neighboring 3D points are predicted simultaneously, enabling us to exploit the point structure prior to further improve the performance. A hierarchical full attention network (HiFANet) is designed to sequentially aggregates patch, bag-of-frames and inter-point semantic cues, with hierarchical attention mechanism tailored for different level of semantic cues. Also, each preceding attention block largely reduces the feature size before feeding to the next attention block, making our framework slim. Experiment results on Semantic-KITTI show that the proposed framework outperforms existing 3D point cloud based methods significantly, it requires much less training data and exhibits tolerance to pose noise. The code is available at https://github.com/yuhanghe01/HiFANet.

READ FULL TEXT

page 1

page 6

page 7

research
08/28/2023

Attention-Guided Lidar Segmentation and Odometry Using Image-to-Point Cloud Saliency Transfer

LiDAR odometry estimation and 3D semantic segmentation are crucial for a...
research
02/28/2022

Spatiotemporal Transformer Attention Network for 3D Voxel Level Joint Segmentation and Motion Prediction in Point Cloud

Environment perception including detection, classification, tracking, an...
research
10/02/2022

GaIA: Graphical Information Gain based Attention Network for Weakly Supervised Point Cloud Semantic Segmentation

While point cloud semantic segmentation is a significant task in 3D scen...
research
09/11/2023

UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase

Point-, voxel-, and range-views are three representative forms of point ...
research
09/20/2023

Towards Robust Few-shot Point Cloud Semantic Segmentation

Few-shot point cloud semantic segmentation aims to train a model to quic...
research
03/25/2023

VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud

The task of 3D semantic scene graph (3DSSG) prediction in the point clou...
research
09/21/2023

On-the-Fly SfM: What you capture is What you get

Over the last decades, ample achievements have been made on Structure fr...

Please sign up or login with your details

Forgot password? Click here to reset