Pointing Novel Objects in Image Captioning

04/25/2019
by   Yehao Li, et al.
0

Image captioning has received significant attention with remarkable improvements in recent advances. Nevertheless, images in the wild encapsulate rich knowledge and cannot be sufficiently described with models built on image-caption pairs containing only in-domain objects. In this paper, we propose to address the problem by augmenting standard deep captioning architectures with object learners. Specifically, we present Long Short-Term Memory with Pointing (LSTM-P) --- a new architecture that facilitates vocabulary expansion and produces novel objects via pointing mechanism. Technically, object learners are initially pre-trained on available object recognition data. Pointing in LSTM-P then balances the probability between generating a word through LSTM and copying a word from the recognized objects at each time step in decoder stage. Furthermore, our captioning encourages global coverage of objects in the sentence. Extensive experiments are conducted on both held-out COCO image captioning and ImageNet datasets for describing novel objects, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, we obtain an average of 60.9 F1 score on held-out COCO dataset.

READ FULL TEXT

page 7

page 8

research
08/17/2017

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Image captioning often requires a large set of training image-sentence p...
research
03/06/2020

Captioning Images with Novel Objects via Online Vocabulary Expansion

In this study, we introduce a low cost method for generating description...
research
08/06/2019

Cascaded Revision Network for Novel Object Captioning

Image captioning, a challenging task where the machine automatically des...
research
06/02/2021

Learning to Select: A Fully Attentive Approach for Novel Object Captioning

Image captioning models have lately shown impressive results when applie...
research
03/28/2022

NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge

Novel object captioning aims at describing objects absent from training ...
research
10/17/2017

Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance

Images in the wild encapsulate rich knowledge about varied abstract conc...
research
12/02/2016

Guided Open Vocabulary Image Captioning with Constrained Beam Search

Existing image captioning models do not generalize well to out-of-domain...

Please sign up or login with your details

Forgot password? Click here to reset