OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

04/11/2023
by   Yunpeng Zhang, et al.
0

The vision-based perception for autonomous driving has undergone a transformation from the bird-eye-view (BEV) representations to the 3D semantic occupancy. Compared with the BEV planes, the 3D semantic occupancy further provides structural information along the vertical direction. This paper presents OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction. OccFormer achieves a long-range, dynamic, and efficient encoding of the camera-generated 3D voxel features. It is obtained by decomposing the heavy 3D processing into the local and global transformer pathways along the horizontal plane. For the occupancy decoder, we adapt the vanilla Mask2Former for 3D semantic occupancy by proposing preserve-pooling and class-guided sampling, which notably mitigate the sparsity and class imbalance. Experimental results demonstrate that OccFormer significantly outperforms existing methods for semantic scene completion on SemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset. Code is available at <https://github.com/zhangyp15/OccFormer>.

READ FULL TEXT

page 3

page 7

page 8

page 11

research
02/15/2023

Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction

Modern methods for vision-centric autonomous driving perception widely a...
research
08/31/2023

PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction

Semantic segmentation in autonomous driving has been undergoing an evolu...
research
04/22/2023

LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation

Semantic map construction under bird's-eye view (BEV) plays an essential...
research
04/12/2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Although vision transformers (ViTs) have achieved great success in compu...
research
04/11/2022

HFT: Lifting Perspective Representations via Hybrid Feature Transformation

Autonomous driving requires accurate and detailed Bird's Eye View (BEV) ...
research
11/25/2021

ContourletNet: A Generalized Rain Removal Architecture Using Multi-Direction Hierarchical Representation

Images acquired from rainy scenes usually suffer from bad visibility whi...
research
06/27/2023

Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal ...

Please sign up or login with your details

Forgot password? Click here to reset