Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

12/15/2016
by   Hao Liu, et al.
0

Along with the prosperity of recurrent neural network in modelling sequential data and the power of attention mechanism in automatically identify salient information, image captioning, a.k.a., image description, has been remarkably advanced in recent years. Nonetheless, most existing paradigms may suffer from the deficiency of invariance to images with different scaling, rotation, etc.; and effective integration of standalone attention to form a holistic end-to-end system. In this paper, we propose a novel image captioning architecture, termed Recurrent Image Captioner (RIC), which allows visual encoder and language decoder to coherently cooperate in a recurrent manner. Specifically, we first equip CNN-based visual encoder with a differentiable layer to enable spatially invariant transformation of visual signals. Moreover, we deploy an attention filter module (differentiable) between encoder and decoder to dynamically determine salient visual parts. We also employ bidirectional LSTM to preprocess sentences for generating better textual representations. Besides, we propose to exploit variational inference to optimize the whole architecture. Extensive experimental results on three benchmark datasets (i.e., Flickr8k, Flickr30k and MS COCO) demonstrate the superiority of our proposed architecture as compared to most of the state-of-the-art methods.

READ FULL TEXT

page 3

page 8

page 9

research
06/16/2022

Image Captioning based on Feature Refinement and Reflective Decoding

Automatically generating a description of an image in natural language i...
research
08/07/2019

Scene-based Factored Attention for Image Captioning

Image captioning has attracted ever-increasing research attention in the...
research
07/14/2023

AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes

Image captioning is a significant field across computer vision and natur...
research
03/30/2018

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

Recently, caption generation with an encoder-decoder framework has been ...
research
10/20/2022

Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation

Image-to-text tasks, such as open-ended image captioning and controllabl...
research
07/10/2018

Topic-Guided Attention for Image Captioning

Attention mechanisms have attracted considerable interest in image capti...
research
10/15/2019

Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style

Image captioning is a research hotspot where encoder-decoder models comb...

Please sign up or login with your details

Forgot password? Click here to reset