phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning

08/20/2016
by   Ying Hua Tan, et al.
0

A picture is worth a thousand words. Not until recently, however, we noticed some success stories in understanding of visual scenes: a model that is able to detect/name objects, describe their attributes, and recognize their relationships/interactions. In this paper, we propose a phrase-based hierarchical Long Short-Term Memory (phi-LSTM) model to generate image description. The proposed model encodes sentence as a sequence of combination of phrases and words, instead of a sequence of words alone as in those conventional solutions. The two levels of this model are dedicated to i) learn to generate image relevant noun phrases, and ii) produce appropriate image description from the phrases and other words in the corpus. Adopting a convolutional neural network to learn image features and the LSTM to learn the word sequence in a sentence, the proposed model has shown better or competitive results in comparison to the state-of-the-art models on Flickr8k and Flickr30k datasets.

READ FULL TEXT

page 17

page 18

page 19

research
11/11/2017

Phrase-based Image Captioning with Hierarchical LSTM Model

Automatic generation of caption to describe the content of an image has ...
research
02/12/2015

Phrase-based Image Captioning

Generating a novel textual description of an image is an interesting pro...
research
02/18/2020

An enhanced Tree-LSTM architecture for sentence semantic modeling using typed dependencies

Tree-based Long short term memory (LSTM) network has become state-of-the...
research
12/29/2014

Simple Image Description Generator via a Linear Phrase-Based Approach

Generating a novel textual description of an image is an interesting pro...
research
10/01/2020

How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text

Long Short-Term Memory recurrent neural network (LSTM) is widely used an...
research
05/22/2017

Learning to Associate Words and Images Using a Large-scale Graph

We develop an approach for unsupervised learning of associations between...
research
01/16/2018

Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs

The driving force behind the recent success of LSTMs has been their abil...

Please sign up or login with your details

Forgot password? Click here to reset