DeepAI AI Chat
Log In Sign Up

Visual Semantic Relatedness Dataset for Image Captioning

by   Ahmed Sabir, et al.

Modern image captioning system relies heavily on extracting knowledge from images to capture the concept of a static story. In this paper, we propose a textual visual context dataset for captioning, in which the publicly available dataset COCO Captions (Lin et al., 2014) has been extended with information about the scene (such as objects in the image). Since this information has a textual form, it can be used to leverage any NLP task, such as text similarity or semantic relation methods, into captioning systems, either as an end-to-end training strategy or a post-processing based approach.


page 1

page 4

page 7


Textual Visual Semantic Dataset for Text Spotting

Text Spotting in the wild consists of detecting and recognizing text app...

Belief Revision based Caption Re-ranker with Visual Semantic Information

In this work, we focus on improving the captions generated by image-capt...

Aligning Visual Regions and Textual Concepts: Learning Fine-Grained Image Representations for Image Captioning

In image-grounded text generation, fine-grained representations of the i...

End-to-end Image Captioning Exploits Multimodal Distributional Similarity

We hypothesize that end-to-end neural image captioning systems work seem...

Aesthetic Image Captioning From Weakly-Labelled Photographs

Aesthetic image captioning (AIC) refers to the multi-modal task of gener...

Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring

Camera-based passive dietary intake monitoring is able to continuously c...

See Your Heart: Psychological states Interpretation through Visual Creations

In psychoanalysis, generating interpretations to one's psychological sta...

Code Repositories


Visual Semantic Relatedness Dataset for Image Captioning.

view repo