LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR Perception

by   Zixiang Zhou, et al.

There is a recent trend in the LiDAR perception field towards unifying multiple tasks in a single strong network with improved performance, as opposed to using separate networks for each task. In this paper, we introduce a new LiDAR multi-task learning paradigm based on the transformer. The proposed LiDARFormer utilizes cross-space global contextual feature information and exploits cross-task synergy to boost the performance of LiDAR perception tasks across multiple large-scale datasets and benchmarks. Our novel transformer-based framework includes a cross-space transformer module that learns attentive features between the 2D dense Bird's Eye View (BEV) and 3D sparse voxel feature maps. Additionally, we propose a transformer decoder for the segmentation task to dynamically adjust the learned features by leveraging the categorical feature representations. Furthermore, we combine the segmentation and detection features in a shared transformer decoder with cross-task attention layers to enhance and integrate the object-level and class-level features. LiDARFormer is evaluated on the large-scale nuScenes and the Waymo Open datasets for both 3D detection and semantic segmentation tasks, and it outperforms all previously published methods on both tasks. Notably, LiDARFormer achieves the state-of-the-art performance of 76.4 74.3 single model LiDAR-only method.


page 3

page 11


LidarMultiNet: Towards a Unified Multi-task Network for LiDAR Perception

LiDAR-based 3D object detection, semantic segmentation, and panoptic seg...

Semantic Segmentation Enhanced Transformer Model for Human Attention Prediction

Saliency Prediction aims to predict the attention distribution of human ...

LidarMultiNet: Unifying LiDAR Semantic Segmentation, 3D Object Detection, and Panoptic Segmentation in a Single Multi-task Network

This technical report presents the 1st place winning solution for the Wa...

Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic Segmentation via Clustering Pseudo Heatmap

As a rising task, panoptic segmentation is faced with challenges in both...

(AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network

Autonomous robotic systems and self driving cars rely on accurate percep...

Simultaneous Bone and Shadow Segmentation Network using Task Correspondence Consistency

Segmenting both bone surface and the corresponding acoustic shadow are f...

Prompt Guided Transformer for Multi-Task Dense Prediction

Task-conditional architecture offers advantage in parameter efficiency b...

Please sign up or login with your details

Forgot password? Click here to reset