DeMIAN: Deep Modality Invariant Adversarial Network

12/23/2016
by   Kuniaki Saito, et al.
0

Obtaining common representations from different modalities is important in that they are interchangeable with each other in a classification problem. For example, we can train a classifier on image features in the common representations and apply it to the testing of the text features in the representations. Existing multi-modal representation learning methods mainly aim to extract rich information from paired samples and train a classifier by the corresponding labels; however, collecting paired samples and their labels simultaneously involves high labor costs. Addressing paired modal samples without their labels and single modal data with their labels independently is much easier than addressing labeled multi-modal data. To obtain the common representations under such a situation, we propose to make the distributions over different modalities similar in the learned representations, namely modality-invariant representations. In particular, we propose a novel algorithm for modality-invariant representation learning, named Deep Modality Invariant Adversarial Network (DeMIAN), which utilizes the idea of Domain Adaptation (DA). Using the modality-invariant representations learned by DeMIAN, we achieved better classification accuracy than with the state-of-the-art methods, especially for some benchmark datasets of zero-shot learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2021

Everything at Once – Multi-modal Fusion Transformer for Video Retrieval

Multi-modal learning from video data has seen increased attention recent...
research
03/03/2022

Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning

We present modality gap, an intriguing geometric phenomenon of the repre...
research
12/20/2013

Learning Paired-associate Images with An Unsupervised Deep Learning Architecture

This paper presents an unsupervised multi-modal learning system that lea...
research
11/21/2022

Unifying Vision-Language Representation Space with Single-tower Transformer

Contrastive learning is a form of distance learning that aims to learn i...
research
06/28/2021

Dizygotic Conditional Variational AutoEncoder for Multi-Modal and Partial Modality Absent Few-Shot Learning

Data augmentation is a powerful technique for improving the performance ...
research
11/16/2017

Deep Matching Autoencoders

Increasingly many real world tasks involve data in multiple modalities o...
research
03/23/2020

Additive Angular Margin for Few Shot Learning to Classify Clinical Endoscopy Images

Endoscopy is a widely used imaging modality to diagnose and treat diseas...

Please sign up or login with your details

Forgot password? Click here to reset