Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

12/20/2014
by   Junhua Mao, et al.
0

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, we apply the m-RNN model to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. The project page of this work is: www.stat.ucla.edu/ junhua.mao/m-RNN.html .

READ FULL TEXT

page 2

page 11

research
10/04/2014

Explain Images with Multimodal Recurrent Neural Networks

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) ...
research
02/05/2016

Generate Image Descriptions based on Deep RNN and Memory Cells for Images Features

Generating natural language descriptions for images is a challenging tas...
research
02/20/2020

Expressing Objects just like Words: Recurrent Visual Embedding for Image-Text Matching

Existing image-text matching approaches typically infer the similarity o...
research
06/09/2019

Attention-based Conditioning Methods for External Knowledge Integration

In this paper, we present a novel approach for incorporating external kn...
research
06/02/2016

Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network

Visual storytelling aims to generate human-level narrative language (i.e...
research
03/27/2017

Where to put the Image in an Image Caption Generator

When a neural language model is used for caption generation, the image i...
research
08/29/2016

Optimizing Recurrent Neural Networks Architectures under Time Constraints

Recurrent neural network (RNN)'s architecture is a key factor influencin...

Please sign up or login with your details

Forgot password? Click here to reset