Rethinking the Form of Latent States in Image Captioning

07/26/2018
by   Bo Dai, et al.
10

RNNs and their variants have been widely adopted for image captioning. In RNNs, the production of a caption is driven by a sequence of latent states. Existing captioning models usually represent latent states as vectors, taking this practice for granted. We rethink this choice and study an alternative formulation, namely using two-dimensional maps to encode latent states. This is motivated by the curiosity about a question: how the spatial structures in the latent states affect the resultant captions? Our study on MSCOCO and Flickr30k leads to two significant observations. First, the formulation with 2D states is generally more effective in captioning, consistently achieving higher performance with comparable parameter sizes. Second, 2D states preserve spatial locality. Taking advantage of this, we visually reveal the internal dynamics in the process of caption generation, as well as the connections between input visual domain and output linguistic domain.

READ FULL TEXT

page 9

page 10

page 14

page 20

page 21

page 22

page 23

page 24

research
05/13/2018

Image Captioning

This paper discusses and demonstrates the outcomes from our experimentat...
research
05/10/2019

Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables

In this work, we study the robustness of a CNN+RNN based image captionin...
research
04/29/2020

Pragmatic Issue-Sensitive Image Captioning

Image captioning systems have recently improved dramatically, but they s...
research
01/04/2020

Understanding Image Captioning Models beyond Visualizing Attention

This paper explains predictions of image captioning models with attentio...
research
12/04/2020

Understanding Guided Image Captioning Performance across Domains

Image captioning models generally lack the capability to take into accou...
research
02/02/2019

Belief dynamics extraction

Animal behavior is not driven simply by its current observations, but is...

Please sign up or login with your details

Forgot password? Click here to reset