Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation

12/26/2016
by   Gwangbeen Park, et al.
0

We present novel method for image-text multi-modal representation learning. In our knowledge, this work is the first approach of applying adversarial learning concept to multi-modal learning and not exploiting image-text pair information to learn multi-modal feature. We only use category information in contrast with most previous methods using image-text pair information for multi-modal embedding. In this paper, we show that multi-modal feature can be achieved without image-text pair information and our method makes more similar distribution with image and text in multi-modal feature space than other methods which use image-text pair information. And we show our multi-modal feature has universal semantic information, even though it was trained for category prediction. Our model is end-to-end backpropagation, intuitive and easily extended to other multi-modal learning work.

READ FULL TEXT

page 1

page 7

research
05/24/2022

Recipe2Vec: Multi-modal Recipe Representation Learning with Graph Neural Networks

Learning effective recipe representations is essential in food studies. ...
research
07/28/2019

Two-Stream CNN with Loose Pair Training for Multi-modal AMD Categorization

This paper studies automated categorization of age-related macular degen...
research
04/26/2021

Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data

This paper studies the problem of novel category discovery on single- an...
research
07/14/2023

MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System

Multi-modal sarcasm detection has attracted much recent attention. Never...
research
09/14/2022

ImageArg: A Multi-modal Tweet Dataset for Image Persuasiveness Mining

The growing interest in developing corpora of persuasive texts has promo...
research
10/09/2021

Embed Everything: A Method for Efficiently Co-Embedding Multi-Modal Spaces

Any general artificial intelligence system must be able to interpret, op...
research
07/23/2020

METEOR: Learning Memory and Time Efficient Representations from Multi-modal Data Streams

Many learning tasks involve multi-modal data streams, where continuous d...

Please sign up or login with your details

Forgot password? Click here to reset