Engaging Image Captioning Via Personality

10/25/2018
by   Kurt Shuster, et al.
2

Standard image captioning tasks such as COCO and Flickr30k are factual, neutral in tone and (to a human) state the obvious (e.g., "a man playing a guitar"). While such tasks are useful to verify that a machine understands the content of an image, they are not engaging to humans as captions. With this in mind we define a new task, Personality-Captions, where the goal is to be as engaging to humans as possible by incorporating controllable style and personality traits. We collect and release a large dataset of 201,858 of such captions conditioned over 215 possible traits. We build models that combine existing work from (i) sentence representations (Mazare et al., 2018) with Transformers trained on 1.7 billion dialogue examples; and (ii) image representations (Mahajan et al., 2018) with ResNets trained on 3.5 billion social media images. We obtain state-of-the-art performance on Flickr30k and COCO, and strong performance on our new task. Finally, online evaluations validate that our task and models are engaging to humans, with our best model close to human performance.

READ FULL TEXT

page 6

page 16

page 17

page 18

page 19

page 20

research
03/26/2020

Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models

Image captioning models have been able to generate grammatically correct...
research
10/31/2018

A task in a suit and a tie: paraphrase generation with semantic augmentation

Paraphrasing is rooted in semantics. We show the effectiveness of transf...
research
04/13/2015

Joint Learning of Distributed Representations for Images and Texts

This technical report provides extra details of the deep multimodal simi...
research
05/16/2018

Defoiling Foiled Image Captions

We address the task of detecting foiled image captions, i.e. identifying...
research
11/02/2018

Engaging Image Chat: Modeling Personality in Grounded Dialogue

To achieve the long-term goal of machines being able to engage humans in...
research
11/07/2021

Machine-in-the-Loop Rewriting for Creative Image Captioning

Machine-in-the-loop writing aims to enable humans to collaborate with mo...
research
03/08/2021

Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles

We present ARCH, a computational pathology (CP) multiple instance captio...

Please sign up or login with your details

Forgot password? Click here to reset