Aesthetic Visual Question Answering of Photographs

08/10/2022
by   Xin Jin, et al.
5

Aesthetic assessment of images can be categorized into two main forms: numerical assessment and language assessment. Aesthetics caption of photographs is the only task of aesthetic language assessment that has been addressed. In this paper, we propose a new task of aesthetic language assessment: aesthetic visual question and answering (AVQA) of images. If we give a question of images aesthetics, model can predict the answer. We use images from www.flickr.com. The objective QA pairs are generated by the proposed aesthetic attributes analysis algorithms. Moreover, we introduce subjective QA pairs that are converted from aesthetic numerical labels and sentiment analysis from large-scale pre-train models. We build the first aesthetic visual question answering dataset, AesVQA, that contains 72,168 high-quality images and 324,756 pairs of aesthetic questions. Two methods for adjusting the data distribution have been proposed and proved to improve the accuracy of existing models. This is the first work that both addresses the task of aesthetic VQA and introduces subjectiveness into VQA tasks. The experimental results reveal that our methods outperform other VQA models on this new task.

READ FULL TEXT

page 2

page 4

page 5

page 6

page 8

page 11

research
04/26/2017

C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

Visual Question Answering (VQA) has received a lot of attention over the...
research
10/28/2020

Leveraging Visual Question Answering to Improve Text-to-Image Synthesis

Generating images from textual descriptions has recently attracted a lot...
research
01/23/2023

HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images

Visual question answering (VQA) is an important and challenging multimod...
research
01/29/2018

Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing

The ability of intelligent agents to play games in human-like fashion is...
research
11/15/2022

MapQA: A Dataset for Question Answering on Choropleth Maps

Choropleth maps are a common visual representation for region-specific t...
research
07/22/2023

Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering

To contribute to automating the medical vision-language model, we propos...
research
05/24/2023

NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario

We introduce a novel visual question answering (VQA) task in the context...

Please sign up or login with your details

Forgot password? Click here to reset