MMTM: Multimodal Transfer Module for CNN Fusion

11/20/2019
by   Hamid Reza Vaezi Joze, et al.
0

In late fusion, each modality is processed in a separate unimodal Convolutional Neural Network (CNN) stream and the scores of each modality are fused at the end. Due to its simplicity late fusion is still the predominant approach in many state-of-the-art multimodal applications. In this paper, we present a simple neural network module for leveraging the knowledge from multiple modalities in convolutional neural networks. The propose unit, named Multimodal Transfer Module (MMTM), can be added at different levels of the feature hierarchy, enabling slow modality fusion. Using squeeze and excitation operations, MMTM utilizes the knowledge of multiple modalities to recalibrate the channel-wise features in each CNN stream. Despite other intermediate fusion methods, the proposed module could be used for feature modality fusion in convolution layers with different spatial dimensions. Another advantage of the proposed method is that it could be added among unimodal branches with minimum changes in the their network architectures, allowing each branch to be initialized with existing pretrained weights. Experimental results show that our framework improves the recognition accuracy of well-known multimodal networks. We demonstrate state-of-the-art or competitive performance on four datasets that span the task domains of dynamic hand gesture recognition, speech enhancement, and action recognition with RGB and body joints.

READ FULL TEXT

page 4

page 5

page 6

research
12/14/2018

Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition with Multimodal Training

We present an efficient approach for leveraging the knowledge from multi...
research
09/06/2022

Finger Multimodal Feature Fusion and Recognition Based on Channel Spatial Attention

Due to the instability and limitations of unimodal biometric systems, mu...
research
12/13/2020

MSAF: Multimodal Split Attention Fusion

Multimodal learning mimics the reasoning process of the human multi-sens...
research
08/22/2018

CentralNet: a Multilayer Approach for Multimodal Fusion

This paper proposes a novel multimodal fusion approach, aiming to produc...
research
08/02/2023

WCCNet: Wavelet-integrated CNN with Crossmodal Rearranging Fusion for Fast Multispectral Pedestrian Detection

Multispectral pedestrian detection achieves better visibility in challen...
research
11/02/2021

Attribute-Based Deep Periocular Recognition: Leveraging Soft Biometrics to Improve Periocular Recognition

In recent years, periocular recognition has been developed as a valuable...
research
05/31/2022

An Effective Fusion Method to Enhance the Robustness of CNN

With the development of technology rapidly, applications of convolutiona...

Please sign up or login with your details

Forgot password? Click here to reset