WCCNet: Wavelet-integrated CNN with Crossmodal Rearranging Fusion for Fast Multispectral Pedestrian Detection

08/02/2023
by   Xingjian Wang, et al.
0

Multispectral pedestrian detection achieves better visibility in challenging conditions and thus has a broad application in various tasks, for which both the accuracy and computational cost are of paramount importance. Most existing approaches treat RGB and infrared modalities equally, typically adopting two symmetrical CNN backbones for multimodal feature extraction, which ignores the substantial differences between modalities and brings great difficulty for the reduction of the computational cost as well as effective crossmodal fusion. In this work, we propose a novel and efficient framework named WCCNet that is able to differentially extract rich features of different spectra with lower computational complexity and semantically rearranges these features for effective crossmodal fusion. Specifically, the discrete wavelet transform (DWT) allowing fast inference and training speed is embedded to construct a dual-stream backbone for efficient feature extraction. The DWT layers of WCCNet extract frequency components for infrared modality, while the CNN layers extract spatial-domain features for RGB modality. This methodology not only significantly reduces the computational complexity, but also improves the extraction of infrared features to facilitate the subsequent crossmodal fusion. Based on the well extracted features, we elaborately design the crossmodal rearranging fusion module (CMRF), which can mitigate spatial misalignment and merge semantically complementary features of spatially-related local regions to amplify the crossmodal complementary information. We conduct comprehensive evaluations on KAIST and FLIR benchmarks, in which WCCNet outperforms state-of-the-art methods with considerable computational efficiency and competitive accuracy. We also perform the ablation study and analyze thoroughly the impact of different components on the performance of WCCNet.

READ FULL TEXT

page 1

page 4

page 6

page 10

page 12

research
07/14/2020

A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection

Existing RGB-D salient object detection (SOD) approaches concentrate on ...
research
11/20/2019

MMTM: Multimodal Transfer Module for CNN Fusion

In late fusion, each modality is processed in a separate unimodal Convol...
research
02/11/2018

FD-MobileNet: Improved MobileNet with a Fast Downsampling Strategy

We present Fast-Downsampling MobileNet (FD-MobileNet), an efficient and ...
research
08/07/2020

Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems

Multispectral pedestrian detection is capable of adapting to insufficien...
research
07/04/2022

TANet: Transformer-based Asymmetric Network for RGB-D Salient Object Detection

Existing RGB-D SOD methods mainly rely on a symmetric two-stream CNN-bas...
research
08/27/2019

Cooperative Cross-Stream Network for Discriminative Action Representation

Spatial and temporal stream model has gained great success in video acti...
research
02/24/2023

Revisiting Modality Imbalance In Multimodal Pedestrian Detection

Multimodal learning, particularly for pedestrian detection, has recently...

Please sign up or login with your details

Forgot password? Click here to reset