DeepAI AI Chat
Log In Sign Up

Deep Multimodal Fusion by Channel Exchanging

by   Yikai Wang, et al.

Deep multimodal fusion by using multiple sources of data for classification or regression has exhibited a clear advantage over the unimodal counterpart on various applications. Yet, current methods including aggregation-based and alignment-based fusion are still inadequate in balancing the trade-off between inter-modal fusion and intra-modal processing, incurring a bottleneck of performance improvement. To this end, this paper proposes Channel-Exchanging-Network (CEN), a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities. Specifically, the channel exchanging process is self-guided by individual channel importance that is measured by the magnitude of Batch-Normalization (BN) scaling factor during training. The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network. Extensive experiments on semantic segmentation via RGB-D data and image translation through multi-domain input verify the effectiveness of our CEN compared to current state-of-the-art methods. Detailed ablation studies have also been carried out, which provably affirm the advantage of each component we propose. Our code is available at


page 8

page 13

page 16

page 17

page 18

page 19

page 20


Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction

Multimodal fusion and multitask learning are two vital topics in machine...

Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

We propose a compact and effective framework to fuse multimodal features...

Multimodal Token Fusion for Vision Transformers

Many adaptations of transformers have emerged to address the single-moda...

Deep-HOSeq: Deep Higher Order Sequence Fusion for Multimodal Sentiment Analysis

Multimodal sentiment analysis utilizes multiple heterogeneous modalities...

IMF: Interactive Multimodal Fusion Model for Link Prediction

Link prediction aims to identify potential missing triples in knowledge ...

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Multimodal learning helps to comprehensively understand the world, by in...

A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion Recognition

Motion recognition is a promising direction in computer vision, but the ...

Code Repositories


[NeurIPS 2020] Code release for "Deep Multimodal Fusion by Channel Exchanging"

view repo