LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion

03/07/2023
by   Xin Li, et al.
0

LiDAR-camera fusion methods have shown impressive performance in 3D object detection. Recent advanced multi-modal methods mainly perform global fusion, where image features and point cloud features are fused across the whole scene. Such practice lacks fine-grained region-level information, yielding suboptimal fusion performance. In this paper, we present the novel Local-to-Global fusion network (LoGoNet), which performs LiDAR-camera fusion at both local and global levels. Concretely, the Global Fusion (GoF) of LoGoNet is built upon previous literature, while we exclusively use point centroids to more precisely represent the position of voxel features, thus achieving better cross-modal alignment. As to the Local Fusion (LoF), we first divide each proposal into uniform grids and then project these grid centers to the images. The image features around the projected grid points are sampled to be fused with position-decorated point cloud features, maximally utilizing the rich contextual information around the proposals. The Feature Dynamic Aggregation (FDA) module is further proposed to achieve information interaction between these locally and globally fused features, thus producing more informative multi-modal features. Extensive experiments on both Waymo Open Dataset (WOD) and KITTI datasets show that LoGoNet outperforms all state-of-the-art 3D detection methods. Notably, LoGoNet ranks 1st on Waymo 3D object detection leaderboard and obtains 81.02 mAPH (L2) detection performance. It is noteworthy that, for the first time, the detection performance on three classes surpasses 80 APH (L2) simultaneously. Code will be available at <https://github.com/sankin97/LoGoNet>.

READ FULL TEXT
research
10/18/2022

Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection

Multi-modal 3D object detection has been an active research topic in aut...
research
01/22/2023

Bidirectional Propagation for Cross-Modal 3D Object Detection

Recent works have revealed the superiority of feature-level fusion for c...
research
12/15/2022

Multi-level and multi-modal feature fusion for accurate 3D object detection in Connected and Automated Vehicles

Aiming at highly accurate object detection for connected and automated v...
research
09/02/2021

4D-Net for Learned Multi-Modal Alignment

We present 4D-Net, a 3D object detection approach, which utilizes 3D Poi...
research
12/21/2021

EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object Detection

Recently, fusing the LiDAR point cloud and camera image to improve the p...
research
05/31/2022

Voxel Field Fusion for 3D Object Detection

In this work, we present a conceptually simple yet effective framework f...
research
06/20/2022

Explicit and implicit models in infrared and visible image fusion

Infrared and visible images, as multi-modal image pairs, show significan...

Please sign up or login with your details

Forgot password? Click here to reset