ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection

08/15/2023
by   Jifeng Shen, et al.
0

Effective feature fusion of multispectral images plays a crucial role in multi-spectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deffciency in local-range feature interaction resulting in the performance degradation. To address this issue, a novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction and capture complementary information across modalities simultaneously. This framework enhances the discriminability of object features through the query-guided cross-attention mechanism, leading to improved performance. However, stacking multiple transformer blocks for feature enhancement incurs a large number of parameters and high spatial complexity. To handle this, inspired by the human process of reviewing knowledge, an iterative interaction mechanism is proposed to share parameters among block-wise multimodal transformers, reducing model complexity and computation cost. The proposed method is general and effective to be integrated into different detection frameworks and used with different backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that the proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios. Code will be available at https://github.com/chanchanchan97/ICAFusion.

READ FULL TEXT

page 3

page 21

page 27

page 28

research
04/27/2023

Optimization-Inspired Cross-Attention Transformer for Compressive Sensing

By integrating certain optimization solvers with deep neural networks, d...
research
10/30/2021

Cross-Modality Fusion Transformer for Multispectral Object Detection

Multispectral image pairs can provide the combined information, making o...
research
07/27/2021

Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Detection transformers have recently shown promising object detection re...
research
01/29/2023

Graph Mixer Networks

In recent years, the attention mechanism has demonstrated superior perfo...
research
07/13/2021

Visual Parser: Representing Part-whole Hierarchies with Transformers

Human vision is able to capture the part-whole hierarchical information ...
research
08/21/2023

Spatial Transform Decoupling for Oriented Object Detection

Vision Transformers (ViTs) have achieved remarkable success in computer ...
research
09/29/2021

Improved Xception with Dual Attention Mechanism and Feature Fusion for Face Forgery Detection

With the rapid development of deep learning technology, more and more fa...

Please sign up or login with your details

Forgot password? Click here to reset