Diverse and Styled Image Captioning Using SVD-Based Mixture of Recurrent Experts

07/07/2020
by   Marzi Heidari, et al.
0

With great advances in vision and natural language processing, the generation of image captions becomes a need. In a recent paper, Mathews, Xie and He [1], extended a new model to generate styled captions by separating semantics and style. In continuation of this work, here a new captioning model is developed including an image encoder to extract the features, a mixture of recurrent networks to embed the set of extracted features to a set of words, and a sentence generator that combines the obtained words as a stylized sentence. The resulted system that entitled as Mixture of Recurrent Experts (MoRE), uses a new training algorithm that derives singular value decomposition (SVD) from weighting matrices of Recurrent Neural Networks (RNNs) to increase the diversity of captions. Each decomposition step depends on a distinctive factor based on the number of RNNs in MoRE. Since the used sentence generator gives a stylized language corpus without paired images, our captioning model can do the same. Besides, the styled and diverse captions are extracted without training on a densely labeled or styled dataset. To validate this captioning model, we use Microsoft COCO which is a standard factual image caption dataset. We show that the proposed captioning model can generate a diverse and stylized image captions without the necessity of extra-labeling. The results also show better descriptions in terms of content accuracy.

READ FULL TEXT

page 3

page 5

research
02/16/2023

Retrieval-augmented Image Captioning

Inspired by retrieval-augmented language generation and pretrained Visio...
research
05/31/2023

LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting

Multilingual image captioning has recently been tackled by training with...
research
07/25/2018

Distinctive-attribute Extraction for Image Captioning

Image captioning, an open research issue, has been evolved with the prog...
research
09/27/2018

Vector Learning for Cross Domain Representations

Recently, generative adversarial networks have gained a lot of popularit...
research
12/13/2021

MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning

Text-based image captioning (TextCap) requires simultaneous comprehensio...
research
10/02/2020

CAPTION: Correction by Analyses, POS-Tagging and Interpretation of Objects using only Nouns

Recently, Deep Learning (DL) methods have shown an excellent performance...
research
09/30/2020

Creative Captioning: An AI Grand Challenge Based on the Dixit Board Game

We propose a new class of "grand challenge" AI problems that we call cre...

Please sign up or login with your details

Forgot password? Click here to reset