Object Counts! Bringing Explicit Detections Back into Image Captioning

04/23/2018
by   Josiah Wang, et al.
0

The use of explicit object detectors as an intermediate step to image captioning - which used to constitute an essential stage in early work - is often bypassed in the currently dominant end-to-end approaches, where the language model is conditioned directly on a mid-level image embedding. We argue that explicit detections provide rich semantic information, and can thus be used as an interpretable representation to better understand why end-to-end image captioning systems work well. We provide an in-depth analysis of end-to-end image captioning by exploring a variety of cues that can be derived from such object detections. Our study reveals that end-to-end image captioning systems rely on matching image representations to generate captions, and that encoding the frequency, size and position of objects are complementary and all play a role in forming a good image representation. It also reveals that different object categories contribute in different ways towards image captioning.

READ FULL TEXT

page 5

page 9

page 13

page 14

research
05/13/2018

Image Captioning

This paper discusses and demonstrates the outcomes from our experimentat...
research
09/11/2018

End-to-end Image Captioning Exploits Multimodal Distributional Similarity

We hypothesize that end-to-end neural image captioning systems work seem...
research
07/22/2020

Integrating Image Captioning with Rule-based Entity Masking

Given an image, generating its natural language description (i.e., capti...
research
05/10/2023

Towards L-System Captioning for Tree Reconstruction

This work proposes a novel concept for tree and plant reconstruction by ...
research
03/27/2018

Neural Baby Talk

We introduce a novel framework for image captioning that can produce nat...
research
05/18/2021

Dependent Multi-Task Learning with Causal Intervention for Image Captioning

Recent work for image captioning mainly followed an extract-then-generat...
research
07/01/2021

Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring

Camera-based passive dietary intake monitoring is able to continuously c...

Please sign up or login with your details

Forgot password? Click here to reset