Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking

05/15/2017
by   Vedran Vukotic, et al.
0

Continuous multimodal representations suitable for multimodal information retrieval are usually obtained with methods that heavily rely on multimodal autoencoders. In video hyperlinking, a task that aims at retrieving video segments, the state of the art is a variation of two interlocked networks working in opposing directions. These systems provide good multimodal embeddings and are also capable of translating from one representation space to the other. Operating on representation spaces, these networks lack the ability to operate in the original spaces (text or image), which makes it difficult to visualize the crossmodal function, and do not generalize well to unseen data. Recently, generative adversarial networks have gained popularity and have been used for generating realistic synthetic data and for obtaining high-level, single-modal latent representation spaces. In this work, we evaluate the feasibility of using GANs to obtain multimodal representations. We show that GANs can be used for multimodal representation learning and that they provide multimodal representations that are superior to representations obtained with multimodal autoencoders. Additionally, we illustrate the ability of visualizing crossmodal translations that can provide human-interpretable insights on learned GAN-based video hyperlinking models.

READ FULL TEXT

page 2

page 3

research
03/07/2017

On the Limits of Learning Representations with Label-Based Supervision

Advances in neural network based classifiers have transformed automatic ...
research
05/01/2021

Stabilization of generative adversarial networks via noisy scale-space

Generative adversarial networks (GAN) is a framework for generating fake...
research
12/11/2022

Using Multiple Instance Learning to Build Multimodal Representations

Image-text multimodal representation learning aligns data across modalit...
research
06/24/2019

Adversarial Multimodal Network for Movie Question Answering

Visual question answering by using information from multiple modalities ...
research
07/21/2021

Multimodal Representations Learning and Adversarial Hypergraph Fusion for Early Alzheimer's Disease Prediction

Multimodal neuroimage can provide complementary information about the de...
research
09/05/2022

Representation Learning for Non-Melanoma Skin Cancer using a Latent Autoencoder

Generative learning is a powerful tool for representation learning, and ...

Please sign up or login with your details

Forgot password? Click here to reset