Robust Deep Multi-modal Learning Based on Gated Information Fusion Network

07/17/2018
by   Jaekyum Kim, et al.
2

The goal of multi-modal learning is to use complimentary information on the relevant task provided by the multiple modalities to achieve reliable and robust performance. Recently, deep learning has led significant improvement in multi-modal learning by allowing for the information fusion in the intermediate feature levels. This paper addresses a problem of designing robust deep multi-modal learning architecture in the presence of imperfect modalities. We introduce deep fusion architecture for object detection which processes each modality using the separate convolutional neural network (CNN) and constructs the joint feature map by combining the intermediate features from the CNNs. In order to facilitate the robustness to the degraded modalities, we employ the gated information fusion (GIF) network which weights the contribution from each modality according to the input feature maps to be fused. The weights are determined through the convolutional layers followed by a sigmoid function and trained along with the information fusion network in an end-to-end fashion. Our experiments show that the proposed GIF network offers the additional architectural flexibility to achieve robust performance in handling some degraded modalities, and show a significant performance improvement based on Single Shot Detector (SSD) for KITTI dataset using the proposed fusion network and data augmentation schemes.

READ FULL TEXT

page 10

page 13

research
06/21/2021

Improving Multi-Modal Learning with Uni-Modal Teachers

Learning multi-modal representations is an essential step towards real-w...
research
09/25/2022

Multimodal Learning with Channel-Mixing and Masked Autoencoder on Facial Action Unit Detection

Recent studies utilizing multi-modal data aimed at building a robust mod...
research
01/29/2019

Deep Neural Networks with Auxiliary-Model Regulated Gating for Resilient Multi-Modal Sensor Fusion

Deep neural networks allow for fusion of high-level features from multip...
research
10/08/2018

Optimized Gated Deep Learning Architectures for Sensor Fusion

Sensor fusion is a key technology that integrates various sensory inputs...
research
08/24/2023

SkipcrossNets: Adaptive Skip-cross Fusion for Road Detection

Multi-modal fusion is increasingly being used for autonomous driving tas...
research
09/28/2021

Fail-Safe Human Detection for Drones Using a Multi-Modal Curriculum Learning Approach

Drones are currently being explored for safety-critical applications whe...
research
02/16/2023

NUAA-QMUL-AIIT at Memotion 3: Multi-modal Fusion with Squeeze-and-Excitation for Internet Meme Emotion Analysis

This paper describes the participation of our NUAA-QMUL-AIIT team in the...

Please sign up or login with your details

Forgot password? Click here to reset