Iconographic Image Captioning for Artworks

02/07/2021
by   Eva Cetinic, et al.
0

Image captioning implies automatically generating textual descriptions of images based only on the visual input. Although this has been an extensively addressed research topic in recent years, not many contributions have been made in the domain of art historical data. In this particular context, the task of image captioning is confronted with various challenges such as the lack of large-scale datasets of image-text pairs, the complexity of meaning associated with describing artworks and the need for expert-level annotations. This work aims to address some of those challenges by utilizing a novel large-scale dataset of artwork images annotated with concepts from the Iconclass classification system designed for art and iconography. The annotations are processed into clean textual description to create a dataset suitable for training a deep neural network model on the image captioning task. Motivated by the state-of-the-art results achieved in generating captions for natural images, a transformer-based vision-language pre-trained model is fine-tuned using the artwork image dataset. Quantitative evaluation of the results is performed using standard image captioning metrics. The quality of the generated captions and the model's capacity to generalize to new data is explored by employing the model on a new collection of paintings and performing an analysis of the relation between commonly generated captions and the artistic genre. The overall results suggest that the model can generate meaningful captions that exhibit a stronger relevance to the art historical context, particularly in comparison to captions obtained from models trained only on natural image datasets.

READ FULL TEXT

page 5

page 9

research
05/02/2017

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

In recent years, automatic generation of image descriptions (captions), ...
research
08/29/2019

Aesthetic Image Captioning From Weakly-Labelled Photographs

Aesthetic image captioning (AIC) refers to the multi-modal task of gener...
research
07/12/2022

A Baseline for Detecting Out-of-Distribution Examples in Image Captioning

Image captioning research achieved breakthroughs in recent years by deve...
research
10/09/2019

Text-to-Image Synthesis Based on Machine Generated Captions

Text to Image Synthesis refers to the process of automatic generation of...
research
12/01/2022

Weakly Supervised Annotations for Multi-modal Greeting Cards Dataset

In recent years, there is a growing number of pre-trained models trained...
research
12/04/2020

Understanding Guided Image Captioning Performance across Domains

Image captioning models generally lack the capability to take into accou...
research
05/18/2022

It Isn't Sh!tposting, It's My CAT Posting

In this paper, we describe a novel architecture which can generate hilar...

Please sign up or login with your details

Forgot password? Click here to reset