Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

08/17/2017
by   Ting Yao, et al.
0

Image captioning often requires a large set of training image-sentence pairs. In practice, however, acquiring sufficient training pairs is always expensive, making the recent captioning models limited in their ability to describe objects outside of training corpora (i.e., novel objects). In this paper, we present Long Short-Term Memory with Copying Mechanism (LSTM-C) --- a new architecture that incorporates copying into the Convolutional Neural Networks (CNN) plus Recurrent Neural Networks (RNN) image captioning framework, for describing novel objects in captions. Specifically, freely available object recognition datasets are leveraged to develop classifiers for novel objects. Our LSTM-C then nicely integrates the standard word-by-word sentence generation by a decoder RNN with copying mechanism which may instead select words from novel objects at proper places in the output sentence. Extensive experiments are conducted on both MSCOCO image captioning and ImageNet datasets, demonstrating the ability of our proposed LSTM-C architecture to describe novel objects. Furthermore, superior results are reported when compared to state-of-the-art deep models.

READ FULL TEXT
research
04/25/2019

Pointing Novel Objects in Image Captioning

Image captioning has received significant attention with remarkable impr...
research
11/05/2016

Boosting Image Captioning with Attributes

Automatically describing an image with a natural language has been an em...
research
04/04/2016

Image Captioning with Deep Bidirectional LSTMs

This work presents an end-to-end trainable deep bidirectional LSTM (Long...
research
09/15/2017

Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning

In this paper, a self-guiding multimodal LSTM (sg-LSTM) image captioning...
research
12/02/2021

Object-Centric Unsupervised Image Captioning

Training an image captioning model in an unsupervised manner without uti...
research
04/21/2020

ParaCNN: Visual Paragraph Generation via Adversarial Twin Contextual CNNs

Image description generation plays an important role in many real-world ...
research
11/23/2016

Video Captioning with Transferred Semantic Attributes

Automatically generating natural language descriptions of videos plays a...

Please sign up or login with your details

Forgot password? Click here to reset