Diverse Image Captioning with Grounded Style

05/03/2022
by   Franz Klein, et al.
0

Stylized image captioning as presented in prior work aims to generate captions that reflect characteristics beyond a factual description of the scene composition, such as sentiments. Such prior work relies on given sentiment identifiers, which are used to express a certain global style in the caption, e.g. positive or negative, however without taking into account the stylistic content of the visual scene. To address this shortcoming, we first analyze the limitations of current stylized captioning datasets and propose COCO attribute-based augmentations to obtain varied stylized captions from COCO annotations. Furthermore, we encode the stylized information in the latent space of a Variational Autoencoder; specifically, we leverage extracted image attributes to explicitly structure its sequential latent space according to different localized style characteristics. Our experiments on the Senticap and COCO datasets show the ability of our approach to generate accurate captions with diversity in styles that are grounded in the image.

READ FULL TEXT
research
05/29/2020

Controlling Length in Image Captioning

We develop and evaluate captioning models that allow control of caption ...
research
08/02/2023

ADS-Cap: A Framework for Accurate and Diverse Stylized Captioning with Unpaired Stylistic Corpora

Generating visually grounded image captions with specific linguistic sty...
research
08/26/2020

Attr2Style: A Transfer Learning Approach for Inferring Fashion Styles via Apparel Attributes

Popular fashion e-commerce platforms mostly provide details about low-le...
research
11/02/2020

Diverse Image Captioning with Context-Object Split Latent Spaces

Diverse image captioning models aim to learn one-to-many mappings that a...
research
08/26/2021

Similar Scenes arouse Similar Emotions: Parallel Data Augmentation for Stylized Image Captioning

Stylized image captioning systems aim to generate a caption not only sem...
research
03/01/2020

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Humans are able to describe image contents with coarse to fine details a...
research
07/22/2020

Integrating Image Captioning with Rule-based Entity Masking

Given an image, generating its natural language description (i.e., capti...

Please sign up or login with your details

Forgot password? Click here to reset