Scene-based Factored Attention for Image Captioning

08/07/2019
by   Chen Shen, et al.
0

Image captioning has attracted ever-increasing research attention in the multimedia community. To this end, most cutting-edge works rely on an encoder-decoder framework with attention mechanisms, which have achieved remarkable progress. However, such a framework does not consider scene concepts to attend visual information, which leads to sentence bias in caption generation and defects the performance correspondingly. We argue that such scene concepts capture higher-level visual semantics and serve as an important cue in describing images. In this paper, we propose a novel scene-based factored attention module for image captioning. Specifically, the proposed module first embeds the scene concepts into factored weights explicitly and attends the visual information extracted from the input image. Then, an adaptive LSTM is used to generate captions for specific scene types. Experimental results on Microsoft COCO benchmark show that the proposed scene-based attention module improves model performance a lot, which outperforms the state-of-the-art approaches under various evaluation metrics.

READ FULL TEXT

page 1

page 8

research
06/21/2021

TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning

Existing research for image captioning usually represents an image using...
research
07/10/2020

Image Captioning with Compositional Neural Module Networks

In image captioning where fluency is an important factor in evaluation, ...
research
11/23/2016

Semantic Compositional Networks for Visual Captioning

A Semantic Compositional Network (SCN) is developed for image captioning...
research
06/20/2015

Aligning where to see and what to tell: image caption with region-based attention and scene factorization

Recent progress on automatic generation of image captions has shown that...
research
07/10/2018

Topic-Guided Attention for Image Captioning

Attention mechanisms have attracted considerable interest in image capti...
research
12/15/2016

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

Along with the prosperity of recurrent neural network in modelling seque...
research
04/25/2015

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images

In this paper, we address the task of learning novel visual concepts, an...

Please sign up or login with your details

Forgot password? Click here to reset