Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning

02/04/2023
by   Jingqiang Chen, et al.
0

Coherent entity-aware multi-image captioning aims to generate coherent captions for multiple adjacent images in a news document. There are coherence relationships among adjacent images because they often describe same entities or events. These relationships are important for entity-aware multi-image captioning, but are neglected in entity-aware single-image captioning. Most existing work focuses on single-image captioning, while multi-image captioning has not been explored before. Hence, this paper proposes a coherent entity-aware multi-image captioning model by making use of coherence relationships. The model consists of a Transformer-based caption generation model and two types of contrastive learning-based coherence mechanisms. The generation model generates the caption by paying attention to the image and the accompanying text. The horizontal coherence mechanism aims to the make the caption coherent with captions of adjacent images. The vertical coherence mechanism aims to make the caption coherent with the image and the accompanying text. To evaluate coherence between captions, two coherence evaluation metrics are proposed. The new dataset DM800K is constructed that has more images per document than two existing datasets GoodNews and NYT800K, and are more suitable for multi-image captioning. Experiments on three datasets show the proposed captioning model outperforms 6 baselines according to single-image captioning evaluations, and the generated captions are more coherent than that of baselines according to coherence evaluations and human evaluations.

READ FULL TEXT

page 3

page 8

research
10/08/2020

VisualNews : Benchmark and Challenges in Entity-aware Image Captioning

In this paper we propose VisualNews-Captioner, an entity-aware model for...
research
06/15/2020

Multi-Image Summarization: Textual Summary from a Set of Cohesive Images

Multi-sentence summarization is a well studied problem in NLP, while gen...
research
09/03/2018

Diverse and Coherent Paragraph Generation from Images

Paragraph generation from images, which has gained popularity recently, ...
research
03/14/2019

Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning

Our goal in this work is to train an image captioning model that generat...
research
04/07/2020

Context-Aware Group Captioning via Self-Attention and Contrastive Features

While image captioning has progressed rapidly, existing works focus main...
research
09/16/2021

Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning

Automatic transcription of scene understanding in images and videos is a...
research
12/13/2018

Adversarial Inference for Multi-Sentence Video Description

While significant progress has been made in the image captioning task, v...

Please sign up or login with your details

Forgot password? Click here to reset