Decoupled Novel Object Captioner

04/11/2018
by   Yu Wu, et al.
0

Image captioning is a challenging task where the machine automatically describes an image by sentences or phrases. It often requires a large number of paired image-sentence annotations for training. However, a pre-trained captioning model can hardly be applied to a new domain in which some novel object categories exist, i.e., the objects and their description words are unseen during model training. To correctly caption the novel object, it requires professional human workers to annotate the images by sentences with the novel words. It is labor expensive and thus limits its usage in real-world applications. In this paper, we introduce the zero-shot novel object captioning task where the machine generates descriptions without extra sentences about the novel object. To tackle the challenging problem, we propose a Decoupled Novel Object Captioner (DNOC) framework that can fully decouple the language sequence model from the object descriptions. DNOC has two components. 1) A Sequence Model with the Placeholder (SM-P) generates a sentence containing placeholders. The placeholder represents an unseen novel object. Thus, the sequence model can be decoupled from the novel object descriptions. 2) A key-value object memory built upon the freely available detection model, contains the visual information and the corresponding word for each object. The SM-P will generate a query to retrieve the words from the object memory. The placeholder will then be filled with the correct word, resulting in a caption with novel object descriptions. The experimental results on the held-out MSCOCO dataset demonstrate the ability of DNOC in describing novel concepts in the zero-shot novel object captioning task.

READ FULL TEXT

page 1

page 5

page 8

research
07/31/2019

Image Captioning with Unseen Objects

Image caption generation is a long standing and challenging problem at t...
research
08/06/2019

Cascaded Revision Network for Novel Object Captioning

Image captioning, a challenging task where the machine automatically des...
research
11/17/2015

Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

While recent deep neural network models have achieved promising results ...
research
08/13/2021

Detection and Captioning with Unseen Object Classes

Image caption generation is one of the most challenging problems at the ...
research
12/28/2015

Communicating with sentences: A multi-word naming game model

Naming game simulates the process of naming an object by a single word, ...
research
09/15/2017

Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning

In this paper, a self-guiding multimodal LSTM (sg-LSTM) image captioning...
research
08/06/2020

Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards

Generating accurate descriptions for online fashion items is important n...

Please sign up or login with your details

Forgot password? Click here to reset