Multi-Image Summarization: Textual Summary from a Set of Cohesive Images

06/15/2020
by   Nicholas Trieu, et al.
0

Multi-sentence summarization is a well studied problem in NLP, while generating image descriptions for a single image is a well studied problem in Computer Vision. However, for applications such as image cluster labeling or web page summarization, summarizing a set of images is also a useful and challenging task. This paper proposes the new task of multi-image summarization, which aims to generate a concise and descriptive textual summary given a coherent set of input images. We propose a model that extends the image-captioning Transformer-based architecture for single image to multi-image. A dense average image feature aggregation network allows the model to focus on a coherent subset of attributes across the input images. We explore various input representations to the Transformer network and empirically show that aggregated image features are superior to individual image embeddings. We additionally show that the performance of the model is further improved by pretraining the model parameters on a single-image captioning task, which appears to be particularly effective in eliminating hallucinations in the output.

READ FULL TEXT

page 3

page 7

research
02/04/2023

Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning

Coherent entity-aware multi-image captioning aims to generate coherent c...
research
08/05/2023

A Comprehensive Analysis of Real-World Image Captioning and Scene Identification

Image captioning is a computer vision task that involves generating natu...
research
04/29/2020

Image Captioning through Image Transformer

Automatic captioning of images is a task that combines the challenges of...
research
11/20/2016

A Hierarchical Approach for Generating Descriptive Image Paragraphs

Recent progress on image captioning has made it possible to generate nov...
research
08/05/2021

Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning

Existing image captioning methods just focus on understanding the relati...
research
09/03/2018

Diverse and Coherent Paragraph Generation from Images

Paragraph generation from images, which has gained popularity recently, ...
research
11/19/2015

Order-Embeddings of Images and Language

Hypernymy, textual entailment, and image captioning can be seen as speci...

Please sign up or login with your details

Forgot password? Click here to reset