Rethinking the Reference-based Distinctive Image Captioning

07/22/2022
by   Yangjun Mao, et al.
0

Distinctive Image Captioning (DIC) – generating distinctive captions that describe the unique details of a target image – has received considerable attention over the last few years. A recent DIC work proposes to generate distinctive captions by comparing the target image with a set of semantic-similar reference images, i.e., reference-based DIC (Ref-DIC). It aims to make the generated captions can tell apart the target and reference images. Unfortunately, reference images used by existing Ref-DIC works are easy to distinguish: these reference images only resemble the target image at scene-level and have few common objects, such that a Ref-DIC model can trivially generate distinctive captions even without considering the reference images. To ensure Ref-DIC models really perceive the unique objects (or attributes) in target images, we first propose two new Ref-DIC benchmarks. Specifically, we design a two-stage matching mechanism, which strictly controls the similarity between the target and reference images at object-/attribute- level (vs. scene-level). Secondly, to generate distinctive captions, we develop a strong Transformer-based Ref-DIC baseline, dubbed as TransDIC. It not only extracts visual features from the target image, but also encodes the differences between objects in the target and reference images. Finally, for more trustworthy benchmarking, we propose a new evaluation metric named DisCIDEr for Ref-DIC, which evaluates both the accuracy and distinctiveness of the generated captions. Experimental results demonstrate that our TransDIC can generate distinctive captions. Besides, it outperforms several state-of-the-art models on the two new benchmarks over different metrics.

READ FULL TEXT

page 2

page 3

page 5

page 6

page 8

page 10

page 11

research
08/20/2021

Group-based Distinctive Image Captioning with Memory Attention

Describing images using natural language is widely known as image captio...
research
08/08/2022

Distinctive Image Captioning via CLIP Guided Group Optimization

Image captioning models are usually trained according to human annotated...
research
07/31/2023

Guiding Image Captioning Models Toward More Specific Captions

Image captioning is conventionally formulated as the task of generating ...
research
04/07/2020

Context-Aware Group Captioning via Self-Attention and Contrastive Features

While image captioning has progressed rapidly, existing works focus main...
research
04/15/2018

Pragmatically Informative Image Captioning with Character-Level Reference

We combine a neural image captioner with a Rational Speech Acts (RSA) mo...
research
06/06/2021

MOC-GAN: Mixing Objects and Captions to Generate Realistic Images

Generating images with conditional descriptions gains increasing interes...
research
10/31/2019

Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder

Most RNN-based image captioning models receive supervision on the output...

Please sign up or login with your details

Forgot password? Click here to reset