FusionFormer: A Multi-sensory Fusion in Bird's-Eye-View and Temporal Consistent Transformer for 3D Objection

09/11/2023
by   Chunyong Hu, et al.
0

Multi-sensor modal fusion has demonstrated strong advantages in 3D object detection tasks. However, existing methods that fuse multi-modal features through a simple channel concatenation require transformation features into bird's eye view space and may lose the information on Z-axis thus leads to inferior performance. To this end, we propose FusionFormer, an end-to-end multi-modal fusion framework that leverages transformers to fuse multi-modal features and obtain fused BEV features. And based on the flexible adaptability of FusionFormer to the input modality representation, we propose a depth prediction branch that can be added to the framework to improve detection performance in camera-based detection tasks. In addition, we propose a plug-and-play temporal fusion module based on transformers that can fuse historical frame BEV features for more stable and reliable detection results. We evaluate our method on the nuScenes dataset and achieve 72.6 NDS for 3D object detection tasks, outperforming state-of-the-art methods.

READ FULL TEXT

page 8

page 10

research
08/25/2022

Bridging the View Disparity of Radar and Camera Features for Multi-modal Fusion 3D Object Detection

Environmental perception with multi-modal fusion of radar and camera is ...
research
09/13/2022

M^2-3DLaneNet: Multi-Modal 3D Lane Detection

Estimating accurate lane lines in 3D space remains challenging due to th...
research
07/09/2021

Multimodal Icon Annotation For Mobile Applications

Annotating user interfaces (UIs) that involves localization and classifi...
research
06/21/2017

Multi-Modal Trip Hazard Affordance Detection On Construction Sites

Trip hazards are a significant contributor to accidents on construction ...
research
10/13/2022

X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation

Bird's-eye-view (BEV) grid is a common representation for the perception...
research
07/01/2022

MMFN: Multi-Modal-Fusion-Net for End-to-End Driving

Inspired by the fact that humans use diverse sensory organs to perceive ...
research
01/29/2019

Deep Neural Networks with Auxiliary-Model Regulated Gating for Resilient Multi-Modal Sensor Fusion

Deep neural networks allow for fusion of high-level features from multip...

Please sign up or login with your details

Forgot password? Click here to reset