Generative-based Fusion Mechanism for Multi-Modal Tracking

09/04/2023
by   Zhangyong Tang, et al.
0

Generative models (GMs) have received increasing research interest for their remarkable capacity to achieve comprehensive understanding. However, their potential application in the domain of multi-modal tracking has remained relatively unexplored. In this context, we seek to uncover the potential of harnessing generative techniques to address the critical challenge, information fusion, in multi-modal tracking. In this paper, we delve into two prominent GM techniques, namely, Conditional Generative Adversarial Networks (CGANs) and Diffusion Models (DMs). Different from the standard fusion process where the features from each modality are directly fed into the fusion block, we condition these multi-modal features with random noise in the GM framework, effectively transforming the original training samples into harder instances. This design excels at extracting discriminative clues from the features, enhancing the ultimate tracking performance. To quantitatively gauge the effectiveness of our approach, we conduct extensive experiments across two multi-modal tracking tasks, three baseline methods, and three challenging benchmarks. The experimental results demonstrate that the proposed generative-based fusion mechanism achieves state-of-the-art performance, setting new records on LasHeR and RGBD1K.

READ FULL TEXT

page 1

page 5

page 7

page 11

page 13

page 14

research
05/02/2023

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

We abstract the features (i.e. learned representations) of multi-modal d...
research
08/28/2021

AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data

The use of multi-modal data such as the combination of whole slide image...
research
03/17/2020

M^5L: Multi-Modal Multi-Margin Metric Learning for RGBT Tracking

Classifying the confusing samples in the course of RGBT tracking is a qu...
research
06/21/2017

Multi-Modal Trip Hazard Affordance Detection On Construction Sites

Trip hazards are a significant contributor to accidents on construction ...
research
07/13/2022

Multi-modal Depression Estimation based on Sub-attentional Fusion

Failure to timely diagnose and effectively treat depression leads to ove...
research
04/17/2022

What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study

Most of the existing work in one-stage referring expression comprehensio...
research
09/01/2022

Zero-Shot Multi-Modal Artist-Controlled Retrieval and Exploration of 3D Object Sets

When creating 3D content, highly specialized skills are generally needed...

Please sign up or login with your details

Forgot password? Click here to reset