Modality To Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion

11/18/2019
by   Sijie Mai, et al.
0

Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of target modality via their respective encoders using adversarial training. Furthermore, we exert additional constraints on embedding space by introducing reconstruction and classification losses. Then we fuse the encoded representations using hierarchical graph neural network which explicitly explores unimodal, bimodal and trimodal interactions in multi-stage. Our method achieves state-of-the-art performances on multiple datasets. Visualization of the learned embeddings suggests that the joint embedding space learned by our method is discriminative.

READ FULL TEXT

page 3

page 8

research
09/01/2022

Universal Multi-Modality Retrieval with One Unified Embedding Space

This paper presents Vision-Language Universal Search (VL-UnivSearch), wh...
research
06/04/2020

MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning

Humans are able to create rich representations of their external reality...
research
09/15/2023

One-stage Modality Distillation for Incomplete Multimodal Learning

Learning based on multimodal data has attracted increasing interest rece...
research
10/15/2019

Target-Oriented Deformation of Visual-Semantic Embedding Space

Multimodal embedding is a crucial research topic for cross-modal underst...
research
03/03/2022

Graph Neural Networks for Multimodal Single-Cell Data Integration

Recent advances in multimodal single-cell technologies have enabled simu...
research
03/12/2020

MVLoc: Multimodal Variational Geometry-Aware Learning for Visual Localization

Recent learning-based research has achieved impressive results in the fi...
research
02/25/2020

Geometric Fusion via Joint Delay Embeddings

We introduce geometric and topological methods to develop a new framewor...

Please sign up or login with your details

Forgot password? Click here to reset