Vector Learning for Cross Domain Representations

09/27/2018
by   Shagan Sah, et al.
0

Recently, generative adversarial networks have gained a lot of popularity for image generation tasks. However, such models are associated with complex learning mechanisms and demand very large relevant datasets. This work borrows concepts from image and video captioning models to form an image generative framework. The model is trained in a similar fashion as recurrent captioning model and uses the learned weights for image generation. This is done in an inverse direction, where the input is a caption and the output is an image. The vector representation of the sentence and frames are extracted from an encoder-decoder model which is initially trained on similar sentence and image pairs. Our model conditions image generation on a natural language caption. We leverage a sequence-to-sequence model to generate synthetic captions that have the same meaning for having a robust image generation. One key advantage of our method is that the traditional image captioning datasets can be used for synthetic sentence paraphrases. Results indicate that images generated through multiple captions are better at capturing the semantic meaning of the family of captions.

READ FULL TEXT

page 3

page 4

research
09/11/2021

Bornon: Bengali Image Captioning with Transformer-based Deep learning approach

Image captioning using Encoder-Decoder based approach where CNN is used ...
research
05/31/2023

LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting

Multilingual image captioning has recently been tackled by training with...
research
07/07/2020

Diverse and Styled Image Captioning Using SVD-Based Mixture of Recurrent Experts

With great advances in vision and natural language processing, the gener...
research
08/09/2015

Image Representations and New Domains in Neural Image Captioning

We examine the possibility that recent promising results in automatic ca...
research
03/08/2020

Better Captioning with Sequence-Level Exploration

Sequence-level learning objective has been widely used in captioning tas...
research
01/18/2020

Text-to-Image Generation with Attention Based Recurrent Neural Networks

Conditional image modeling based on textual descriptions is a relatively...
research
10/16/2019

Imperial College London Submission to VATEX Video Captioning Task

This paper describes the Imperial College London team's submission to th...

Please sign up or login with your details

Forgot password? Click here to reset