Learning to Select: A Fully Attentive Approach for Novel Object Captioning

06/02/2021
by   Marco Cagrandi, et al.
42

Image captioning models have lately shown impressive results when applied to standard datasets. Switching to real-life scenarios, however, constitutes a challenge due to the larger variety of visual concepts which are not covered in existing training sets. For this reason, novel object captioning (NOC) has recently emerged as a paradigm to test captioning models on objects which are unseen during the training phase. In this paper, we present a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set, and to constrain the generative process of a language model accordingly. Our architecture is fully-attentive and end-to-end trainable, also when incorporating constraints. We perform experiments on the held-out COCO dataset, where we demonstrate improvements over the state of the art, both in terms of adaptability to novel objects and caption quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2018

nocaps: novel object captioning at scale

Image captioning models have achieved impressive results on datasets con...
research
04/25/2019

Pointing Novel Objects in Image Captioning

Image captioning has received significant attention with remarkable impr...
research
06/15/2018

Partially-Supervised Image Captioning

Image captioning models are becoming increasingly successful at describi...
research
09/10/2021

Partially-supervised novel object captioning leveraging context from paired data

In this paper, we propose an approach to improve image captioning soluti...
research
08/06/2019

Cascaded Revision Network for Novel Object Captioning

Image captioning, a challenging task where the machine automatically des...
research
11/14/2015

Oracle performance for visual captioning

The task of associating images and videos with a natural language descri...
research
10/07/2019

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability

The ability to generate natural language explanations conditioned on the...

Please sign up or login with your details

Forgot password? Click here to reset