Visual Semantic Relatedness Dataset for Image Captioning

01/20/2023
by   Ahmed Sabir, et al.
0

Modern image captioning system relies heavily on extracting knowledge from images to capture the concept of a static story. In this paper, we propose a textual visual context dataset for captioning, in which the publicly available dataset COCO Captions (Lin et al., 2014) has been extended with information about the scene (such as objects in the image). Since this information has a textual form, it can be used to leverage any NLP task, such as text similarity or semantic relation methods, into captioning systems, either as an end-to-end training strategy or a post-processing based approach.

READ FULL TEXT

page 1

page 4

page 7

research
04/21/2020

Textual Visual Semantic Dataset for Text Spotting

Text Spotting in the wild consists of detecting and recognizing text app...
research
09/16/2022

Belief Revision based Caption Re-ranker with Visual Semantic Information

In this work, we focus on improving the captions generated by image-capt...
research
05/15/2019

Aligning Visual Regions and Textual Concepts: Learning Fine-Grained Image Representations for Image Captioning

In image-grounded text generation, fine-grained representations of the i...
research
04/27/2023

Learning Human-Human Interactions in Images from Weak Textual Supervision

Interactions between humans are diverse and context-dependent, but previ...
research
09/11/2018

End-to-end Image Captioning Exploits Multimodal Distributional Similarity

We hypothesize that end-to-end neural image captioning systems work seem...
research
08/29/2019

Aesthetic Image Captioning From Weakly-Labelled Photographs

Aesthetic image captioning (AIC) refers to the multi-modal task of gener...
research
07/01/2021

Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring

Camera-based passive dietary intake monitoring is able to continuously c...

Please sign up or login with your details

Forgot password? Click here to reset