SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

05/18/2018
by   Alexander Mathews, et al.
0

Linguistic style is an essential part of written communication, with the power to affect both clarity and attractiveness. With recent advances in vision and language, we can start to tackle the problem of generating image captions that are both visually grounded and appropriately styled. Existing approaches either require styled training captions aligned to images or generate captions with low relevance. We develop a model that learns to generate visually relevant styled captions from a large corpus of styled text without aligned images. The core idea of this model, called SemStyle, is to separate semantics and style. One key component is a novel and concise semantic term representation generated using natural language processing techniques and frame semantics. In addition, we develop a unified language model that decodes sentences with diverse word choices and syntax for different styles. Evaluations, both automatic and manual, show captions from SemStyle preserve image semantics, are descriptive, and are style shifted. More broadly, this work provides possibilities to learn richer image descriptions from the plethora of linguistic data available on the web.

READ FULL TEXT

page 8

page 13

page 14

research
08/08/2019

Towards Generating Stylized Image Captions via Adversarial Training

While most image captioning aims to generate objective descriptions of i...
research
01/30/2018

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Automatic image captioning has recently approached human-level performan...
research
02/09/2023

A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions

We present a large, multilingual study into how vision constrains lingui...
research
11/18/2014

From Captions to Visual Concepts and Back

This paper presents a novel approach for automatically generating image ...
research
06/07/2019

Visually Grounded Neural Syntax Acquisition

We present the Visually Grounded Neural Syntax Learner (VG-NSL), an appr...
research
02/17/2023

Paint it Black: Generating paintings from text descriptions

Two distinct tasks - generating photorealistic pictures from given text ...
research
01/20/2021

Towards Understanding How Readers Integrate Charts and Captions: A Case Study with Line Charts

Charts often contain visually prominent features that draw attention to ...

Please sign up or login with your details

Forgot password? Click here to reset