Recurrent Fusion Network for Image Captioning

07/26/2018
by   Wenhao Jiang, et al.
0

Recently, much advance has been made in image captioning, and an encoder-decoder framework has been adopted by all the state-of-the-art models. Under this framework, an input image is encoded by a convolutional neural network (CNN) and then translated into natural language with a recurrent neural network (RNN). The existing models counting on this framework employ only one kind of CNNs, e.g., ResNet or Inception-X, which describes the image contents from only one specific view point. Thus, the semantic meaning of the input image cannot be comprehensively understood, which restricts improving the performance. In this paper, to exploit the complementary information from multiple encoders, we propose a novel recurrent fusion network (RFNet) for the image captioning task. The fusion process in our model can exploit the interactions among the outputs of the image encoders and generate new compact and informative representations for the decoder. Experiments on the MSCOCO dataset demonstrate the effectiveness of our proposed RFNet, which sets a new state-of-the-art for image captioning.

READ FULL TEXT

page 19

page 20

research
02/14/2021

Improved Bengali Image Captioning via deep convolutional neural network based encoder-decoder model

Image Captioning is an arduous task of producing syntactically and seman...
research
06/26/2017

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Image captioning has been recently gaining a lot of attention thanks to ...
research
05/25/2023

HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning

A great deal of progress has been made in image captioning, driven by re...
research
01/06/2023

An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU)

Image captioning by the encoder-decoder framework has shown tremendous a...
research
03/18/2017

Recurrent Models for Situation Recognition

This work proposes Recurrent Neural Network (RNN) models to predict stru...
research
05/07/2015

Language Models for Image Captioning: The Quirks and What Works

Two recent approaches have achieved state-of-the-art results in image ca...
research
03/08/2021

Analysis of Convolutional Decoder for Image Caption Generation

Recently Convolutional Neural Networks have been proposed for Sequence M...

Please sign up or login with your details

Forgot password? Click here to reset