TFusion: Transformer based N-to-One Multimodal Fusion Block

08/26/2022
by   Zecheng Liu, et al.
0

People perceive the world with different senses, such as sight, hearing, smell, and touch. Processing and fusing information from multiple modalities enables Artificial Intelligence to understand the world around us more easily. However, when there are missing modalities, the number of available modalities is different in diverse situations, which leads to an N-to-One fusion problem. To solve this problem, we propose a transformer based fusion block called TFusion. Different from preset formulations or convolution based methods, the proposed block automatically learns to fuse available modalities without synthesizing or zero-padding missing ones. Specifically, the feature representations extracted from upstream processing model are projected as tokens and fed into transformer layers to generate latent multimodal correlations. Then, to reduce the dependence on particular modalities, a modal attention mechanism is introduced to build a shared representation, which can be applied by the downstream decision model. The proposed TFusion block can be easily integrated into existing multimodal analysis networks. In this work, we apply TFusion to different backbone networks for multimodal human activity recognition and brain tumor segmentation tasks. Extensive experimental results show that the TFusion block achieves better performance than the competing fusion strategies.

READ FULL TEXT

page 1

page 3

page 6

research
09/07/2023

Multimodal Transformer for Material Segmentation

Leveraging information across diverse modalities is known to enhance per...
research
10/08/2018

Dense Multimodal Fusion for Hierarchically Joint Representation

Multiple modalities can provide more valuable information than single on...
research
04/12/2022

Are Multimodal Transformers Robust to Missing Modality?

Multimodal data collected from the real world are often imperfect due to...
research
11/02/2021

A Tri-attention Fusion Guided Multi-modal Segmentation Network

In the field of multimodal segmentation, the correlation between differe...
research
07/04/2023

H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation

Recently, deep learning methods have been widely used for tumor segmenta...
research
10/20/2022

A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition

Utilizing the sensor characteristics of the audio, visible camera, and t...
research
09/06/2022

Fusion of Satellite Images and Weather Data with Transformer Networks for Downy Mildew Disease Detection

Crop diseases significantly affect the quantity and quality of agricultu...

Please sign up or login with your details

Forgot password? Click here to reset