Image captioning with weakly-supervised attention penalty

03/06/2019
by   Jiayun Li, et al.
0

Stories are essential for genealogy research since they can help build emotional connections with people. A lot of family stories are reserved in historical photos and albums. Recent development on image captioning models makes it feasible to "tell stories" for photos automatically. The attention mechanism has been widely adopted in many state-of-the-art encoder-decoder based image captioning models, since it can bridge the gap between the visual part and the language part. Most existing captioning models implicitly trained attention modules with word-likelihood loss. Meanwhile, lots of studies have investigated intrinsic attentions for visual models using gradient-based approaches. Ideally, attention maps predicted by captioning models should be consistent with intrinsic attentions from visual models for any given visual concept. However, no work has been done to align implicitly learned attention maps with intrinsic visual attentions. In this paper, we proposed a novel model that measured consistency between captioning predicted attentions and intrinsic visual attentions. This alignment loss allows explicit attention correction without using any expensive bounding box annotations. We developed and evaluated our model on COCO dataset as well as a genealogical dataset from Ancestry.com Operations Inc., which contains billions of historical photos. The proposed model achieved better performances on all commonly used language evaluation metrics for both datasets.

READ FULL TEXT

page 1

page 5

page 6

page 7

research
12/06/2016

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely adopt...
research
12/14/2020

Intrinsic Image Captioning Evaluation

The image captioning task is about to generate suitable descriptions fro...
research
05/31/2016

Attention Correctness in Neural Image Captioning

Attention mechanisms have recently been introduced in deep learning for ...
research
03/04/2019

COMIC: Towards A Compact Image Captioning Model with Attention

Recent works in image captioning have shown very promising raw performan...
research
03/07/2022

Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition

The goal of unpaired image captioning (UIC) is to describe images withou...
research
09/08/2021

RefineCap: Concept-Aware Refinement for Image Captioning

Automatically translating images to texts involves image scene understan...
research
11/29/2019

OptiBox: Breaking the Limits of Proposals for Visual Grounding

The problem of language grounding has attracted much attention in recent...

Please sign up or login with your details

Forgot password? Click here to reset