Spatio-channel Attention Blocks for Cross-modal Crowd Counting

10/19/2022
by   Youjia Zhang, et al.
0

Crowd counting research has made significant advancements in real-world applications, but it remains a formidable challenge in cross-modal settings. Most existing methods rely solely on the optical features of RGB images, ignoring the feasibility of other modalities such as thermal and depth images. The inherently significant differences between the different modalities and the diversity of design choices for model architectures make cross-modal crowd counting more challenging. In this paper, we propose Cross-modal Spatio-Channel Attention (CSCA) blocks, which can be easily integrated into any modality-specific architecture. The CSCA blocks first spatially capture global functional correlations among multi-modality with less overhead through spatial-wise cross-modal attention. Cross-modal features with spatial attention are subsequently refined through adaptive channel-wise feature aggregation. In our experiments, the proposed block consistently shows significant performance improvement across various backbone networks, resulting in state-of-the-art results in RGB-T and RGB-D crowd counting.

READ FULL TEXT

page 2

page 14

research
12/08/2020

Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

Crowd counting is a fundamental yet challenging problem, which desires r...
research
09/13/2023

Multi-Modal Hybrid Learning and Sequential Training for RGB-T Saliency Detection

RGB-T saliency detection has emerged as an important computer vision tas...
research
02/28/2023

RGB-D Grasp Detection via Depth Guided Learning with Cross-modal Attention

Planar grasp detection is one of the most fundamental tasks to robotic m...
research
09/28/2022

Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection

Integrating multispectral data in object detection, especially visible a...
research
08/08/2020

Cross-modal Center Loss

Cross-modal retrieval aims to learn discriminative and modal-invariant f...
research
11/01/2019

Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning

This paper addresses the challenging task of video captioning which aims...
research
09/12/2020

RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

We study an important, yet largely unexplored problem of large-scale cro...

Please sign up or login with your details

Forgot password? Click here to reset