RGB-T Multi-Modal Crowd Counting Based on Transformer

01/08/2023
by   Zhengyi Liu, et al.
0

Crowd counting aims to estimate the number of persons in a scene. Most state-of-the-art crowd counting methods based on color images can't work well in poor illumination conditions due to invisible objects. With the widespread use of infrared cameras, crowd counting based on color and thermal images is studied. Existing methods only achieve multi-modal fusion without count objective constraint. To better excavate multi-modal information, we use count-guided multi-modal fusion and modal-guided count enhancement to achieve the impressive performance. The proposed count-guided multi-modal fusion module utilizes a multi-scale token transformer to interact two-modal information under the guidance of count information and perceive different scales from the token perspective. The proposed modal-guided count enhancement module employs multi-scale deformable transformer decoder structure to enhance one modality feature and count information by the other modality. Experiment in public RGBT-CC dataset shows that our method refreshes the state-of-the-art results. https://github.com/liuzywen/RGBTCC

READ FULL TEXT

page 2

page 5

research
08/31/2022

NestedFormer: Nested Modality-Aware Transformer for Brain Tumor Segmentation

Multi-modal MR imaging is routinely used in clinical practice to diagnos...
research
12/01/2021

Transformer-based Network for RGB-D Saliency Detection

RGB-D saliency detection integrates information from both RGB images and...
research
10/23/2022

Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation

Although human action anticipation is a task which is inherently multi-m...
research
05/19/2023

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding

Transformers achieve promising performance in document understanding bec...
research
02/17/2022

TAFNet: A Three-Stream Adaptive Fusion Network for RGB-T Crowd Counting

In this paper, we propose a three-stream adaptive fusion network named T...
research
12/08/2020

Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

Crowd counting is a fundamental yet challenging problem, which desires r...
research
05/13/2021

Robust Dynamic Multi-Modal Data Fusion: A Model Uncertainty Perspective

This paper is concerned with multi-modal data fusion (MMDF) under unexpe...

Please sign up or login with your details

Forgot password? Click here to reset