Controllable Image Captioning via Prompting

12/04/2022
by   Ning Wang, et al.
0

Despite the remarkable progress of image captioning, existing captioners typically lack the controllable capability to generate desired image captions, e.g., describing the image in a rough or detailed manner, in a factual or emotional view, etc. In this paper, we show that a unified model is qualified to perform well in diverse domains and freely switch among multiple styles. Such a controllable capability is achieved by embedding the prompt learning into the image captioning framework. To be specific, we design a set of prompts to fine-tune the pre-trained image captioner. These prompts allow the model to absorb stylized data from different domains for joint training, without performance degradation in each domain. Furthermore, we optimize the prompts with learnable vectors in the continuous word embedding space, avoiding the heuristic prompt engineering and meanwhile exhibiting superior performance. In the inference stage, our model is able to generate desired stylized captions by choosing the corresponding prompts. Extensive experiments verify the controllable capability of the proposed method. Notably, we achieve outstanding performance on two diverse image captioning benchmarks including COCO Karpathy split and TextCaps using a unified model.

READ FULL TEXT

page 3

page 6

page 12

page 13

research
07/19/2020

Length-Controllable Image Captioning

The last decade has witnessed remarkable progress in the image captionin...
research
09/10/2023

Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning

While impressive performance has been achieved in image captioning, the ...
research
05/31/2018

Diverse and Controllable Image Captioning with Part-of-Speech Guidance

Automatically describing an image is an important capability for virtual...
research
11/26/2018

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Current captioning approaches can describe images using black-box archit...
research
05/25/2019

SuperCaptioning: Image Captioning Using Two-dimensional Word Embedding

Language and vision are processed as two different modal in current work...
research
09/30/2022

SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation

Recent advances in image captioning have focused on scaling the data and...
research
12/06/2022

Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning

Discriminativeness is a desirable feature of image captions: captions sh...

Please sign up or login with your details

Forgot password? Click here to reset