Aesthetic Image Captioning From Weakly-Labelled Photographs

08/29/2019
by   Koustav Ghosal, et al.
6

Aesthetic image captioning (AIC) refers to the multi-modal task of generating critical textual feedbacks for photographs. While in natural image captioning (NIC), deep models are trained in an end-to-end manner using large curated datasets such as MS-COCO, no such large-scale, clean dataset exists for AIC. Towards this goal, we propose an automatic cleaning strategy to create a benchmarking AIC dataset, by exploiting the images and noisy comments easily available from photography websites. We propose a probabilistic caption-filtering method for cleaning the noisy web-data, and compile a large-scale, clean dataset "AVA-Captions", (230, 000 images with 5 captions per image). Additionally, by exploiting the latent associations between aesthetic attributes, we propose a strategy for training the convolutional neural network (CNN) based visual feature extractor, the first component of the AIC framework. The strategy is weakly supervised and can be effectively used to learn rich aesthetic representations, without requiring expensive ground-truth annotations. We finally show-case a thorough analysis of the proposed contributions using automatic metrics and subjective evaluations.

READ FULL TEXT

page 2

page 5

page 8

research
02/07/2021

Iconographic Image Captioning for Artworks

Image captioning implies automatically generating textual descriptions o...
research
01/20/2023

Visual Semantic Relatedness Dataset for Image Captioning

Modern image captioning system relies heavily on extracting knowledge fr...
research
09/08/2019

Quality Estimation for Image Captions Based on Large-scale Human Evaluations

Automatic image captioning has improved significantly in the last few ye...
research
12/21/2020

Alleviating Noisy Data in Image Captioning with Cooperative Distillation

Image captioning systems have made substantial progress, largely due to ...
research
12/01/2022

Weakly Supervised Annotations for Multi-modal Greeting Cards Dataset

In recent years, there is a growing number of pre-trained models trained...
research
10/06/2021

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

The task of image-text matching aims to map representations from differe...
research
07/11/2019

Aesthetic Attributes Assessment of Images

Image aesthetic quality assessment has been a relatively hot topic durin...

Please sign up or login with your details

Forgot password? Click here to reset