Show, Edit and Tell: A Framework for Editing Image Captions

03/06/2020
by   Fawaz Sammani, et al.
0

Most image captioning frameworks generate captions directly from images, learning a mapping from visual features to natural language. However, editing existing captions can be easier than generating new ones from scratch. Intuitively, when editing captions, a model is not required to learn information that is already present in the caption (i.e. sentence structure), enabling it to focus on fixing details (e.g. replacing repetitive words). This paper proposes a novel approach to image captioning based on iterative adaptive refinement of an existing caption. Specifically, our caption-editing model consisting of two sub-modules: (1) EditNet, a language module with an adaptive copy mechanism (Copy-LSTM) and a Selective Copy Memory Attention mechanism (SCMA), and (2) DCNet, an LSTM-based denoising auto-encoder. These components enable our model to directly copy from and modify existing captions. Experiments demonstrate that our new approach achieves state-of-art performance on the MS COCO dataset both with and without sequence-level training.

READ FULL TEXT

page 4

page 8

research
09/07/2019

Look and Modify: Modification Networks for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely used ...
research
07/20/2022

Explicit Image Caption Editing

Given an image and a reference caption, the image caption editing task a...
research
12/14/2020

Intrinsic Image Captioning Evaluation

The image captioning task is about to generate suitable descriptions fro...
research
06/29/2021

Contrastive Semantic Similarity Learning for Image Captioning Evaluation with Intrinsic Auto-encoder

Automatically evaluating the quality of image captions can be very chall...
research
06/18/2019

Expressing Visual Relationships via Language

Describing images with text is a fundamental problem in vision-language ...
research
02/23/2021

Enhanced Modality Transition for Image Captioning

Image captioning model is a cross-modality knowledge discovery task, whi...
research
11/02/2020

Boost Image Captioning with Knowledge Reasoning

Automatically generating a human-like description for a given image is a...

Please sign up or login with your details

Forgot password? Click here to reset