Visually-Aware Context Modeling for News Image Captioning

08/16/2023
by   Tingyu Qu, et al.
0

The goal of News Image Captioning is to generate an image caption according to the content of both a news article and an image. To leverage the visual information effectively, it is important to exploit the connection between the context in the articles/captions and the images. Psychological studies indicate that human faces in images draw higher attention priorities. On top of that, humans often play a central role in news stories, as also proven by the face-name co-occurrence pattern we discover in existing News Image Captioning datasets. Therefore, we design a face-naming module for faces in images and names in captions/articles to learn a better name embedding. Apart from names, which can be directly linked to an image area (faces), news image captions mostly contain context information that can only be found in the article. Humans typically address this by searching for relevant information from the article based on the image. To emulate this thought process, we design a retrieval strategy using CLIP to retrieve sentences that are semantically close to the image. We conduct extensive experiments to demonstrate the efficacy of our framework. Without using additional paired data, we establish the new state-of-the-art performance on two News Image Captioning datasets, exceeding the previous state-of-the-art by 5 CIDEr points. We will release code upon acceptance.

READ FULL TEXT

page 3

page 12

research
09/07/2021

Journalistic Guidelines Aware News Image Captioning

The task of news article image captioning aims to generate descriptive a...
research
04/17/2020

Transform and Tell: Entity-Aware News Image Captioning

We propose an end-to-end model which generates captions for images embed...
research
12/01/2022

Focus! Relevant and Sufficient Context Selection for News Image Captioning

News Image Captioning requires describing an image by leveraging additio...
research
04/02/2019

Good News, Everyone! Context driven entity-aware captioning for news images

Current image captioning systems perform at a merely descriptive level, ...
research
06/01/2023

"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning

Well-formed context aware image captions and tags in enterprise content ...
research
12/31/2015

Event Specific Multimodal Pattern Mining with Image-Caption Pairs

In this paper we describe a novel framework and algorithms for discoveri...
research
01/26/2023

Style-Aware Contrastive Learning for Multi-Style Image Captioning

Existing multi-style image captioning methods show promising results in ...

Please sign up or login with your details

Forgot password? Click here to reset