Guided Open Vocabulary Image Captioning with Constrained Beam Search

12/02/2016
by   Peter Anderson, et al.
0

Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We address this problem using a flexible approach that enables existing deep captioning architectures to take advantage of image taggers at test time, without re-training. Our method uses constrained beam search to force the inclusion of selected tag words in the output, and fixed, pretrained word embeddings to facilitate vocabulary expansion to previously unseen tag words. Using this approach we achieve state of the art results for out-of-domain captioning on MSCOCO (and improved results for in-domain captioning). Perhaps surprisingly, our results significantly outperform approaches that incorporate the same tag predictions into the learning algorithm. We also show that we can significantly improve the quality of generated ImageNet captions by leveraging ground-truth labels.

READ FULL TEXT

page 6

page 8

page 11

page 12

page 13

page 14

research
12/24/2020

SubICap: Towards Subword-informed Image Captioning

Existing Image Captioning (IC) systems model words as atomic units in ca...
research
03/04/2019

COMIC: Towards A Compact Image Captioning Model with Attention

Recent works in image captioning have shown very promising raw performan...
research
03/06/2020

Captioning Images with Novel Objects via Online Vocabulary Expansion

In this study, we introduce a low cost method for generating description...
research
04/25/2019

Pointing Novel Objects in Image Captioning

Image captioning has received significant attention with remarkable impr...
research
12/04/2020

Understanding Guided Image Captioning Performance across Domains

Image captioning models generally lack the capability to take into accou...
research
04/28/2022

Controllable Image Captioning

State-of-the-art image captioners can generate accurate sentences to des...
research
11/23/2016

Semantic Compositional Networks for Visual Captioning

A Semantic Compositional Network (SCN) is developed for image captioning...

Please sign up or login with your details

Forgot password? Click here to reset