Intention Oriented Image Captions with Guiding Objects

11/19/2018
by   Yue Zheng, et al.
0

Although existing image caption models can produce promising results using recurrent neural networks (RNNs), it is difficult to guarantee that an object we care about is contained in generated descriptions, for example in the case that the object is inconspicuous in image. Problems become even harder when these objects did not appear in training stage. In this paper, we propose a novel approach for generating image captions with guiding objects (CGO). The CGO constrains the model to involve a human-concerned object, when the object is in the image, in the generated description while maintaining fluency. Instead of generating the sequence from left to right, we start description with a selected object and generate other parts of the sequence based on this object. To achieve this, we design a novel framework combining two LSTMs in opposite directions. We demonstrate the characteristics of our method on MSCOCO to generate descriptions for each detected object in images. With CGO, we can extend the ability of description to the objects being neglected in image caption labels and provide a set of more comprehensive and diverse descriptions for an image. CGO shows obvious advantages when applied to the task of describing novel objects. We show experiment results on both MSCOCO and ImageNet datasets. Evaluations show that our method outperforms the state-of-the-art models in the task with average F1 75.8, leading to better descriptions in terms of both content accuracy and fluency.

READ FULL TEXT

page 5

page 8

research
11/17/2015

Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

While recent deep neural network models have achieved promising results ...
research
01/30/2018

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Automatic image captioning has recently approached human-level performan...
research
02/25/2019

Using Deep Object Features for Image Descriptions

Inspired by recent advances in leveraging multiple modalities in machine...
research
10/06/2015

SentiCap: Generating Image Descriptions with Sentiments

The recent progress on image recognition and language modeling is making...
research
07/22/2017

OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts

Generating captions for images is a task that has recently received cons...
research
07/22/2020

Integrating Image Captioning with Rule-based Entity Masking

Given an image, generating its natural language description (i.e., capti...
research
01/20/2018

Multiple Description Convolutional Neural Networks for Image Compression

Multiple description coding (MDC) is able to stably transmit the signal ...

Please sign up or login with your details

Forgot password? Click here to reset