Phrase-based Image Captioning with Hierarchical LSTM Model

11/11/2017
by   Ying Hua Tan, et al.
0

Automatic generation of caption to describe the content of an image has been gaining a lot of research interests recently, where most of the existing works treat the image caption as pure sequential data. Natural language, however possess a temporal hierarchy structure, with complex dependencies between each subsequence. In this paper, we propose a phrase-based hierarchical Long Short-Term Memory (phi-LSTM) model to generate image description. In contrast to the conventional solutions that generate caption in a pure sequential manner, our proposed model decodes image caption from phrase to sentence. It consists of a phrase decoder at the bottom hierarchy to decode noun phrases of variable length, and an abbreviated sentence decoder at the upper hierarchy to decode an abbreviated form of the image description. A complete image caption is formed by combining the generated phrases with sentence during the inference stage. Empirically, our proposed model shows a better or competitive result on the Flickr8k, Flickr30k and MS-COCO datasets in comparison to the state-of-the art models. We also show that our proposed model is able to generate more novel captions (not seen in the training data) which are richer in word contents in all these three datasets.

READ FULL TEXT

page 9

page 12

page 13

page 15

page 16

page 17

research
08/20/2016

phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning

A picture is worth a thousand words. Not until recently, however, we not...
research
02/12/2015

Phrase-based Image Captioning

Generating a novel textual description of an image is an interesting pro...
research
12/29/2014

Simple Image Description Generator via a Linear Phrase-Based Approach

Generating a novel textual description of an image is an interesting pro...
research
04/20/2015

Self-Adaptive Hierarchical Sentence Model

The ability to accurately model a sentence at varying stages (e.g., word...
research
02/14/2020

ResCap V1: Deep Residual Learning Based Image Captioning

Image Captioning alludes to the process of generating text description f...
research
09/28/2021

CIDEr-R: Robust Consensus-based Image Description Evaluation

This paper shows that CIDEr-D, a traditional evaluation metric for image...
research
07/07/2020

Research on Annotation Rules and Recognition Algorithm Based on Phrase Window

At present, most Natural Language Processing technology is based on the ...

Please sign up or login with your details

Forgot password? Click here to reset