HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images

01/23/2023
by   Kun Li, et al.
0

Visual question answering (VQA) is an important and challenging multimodal task in computer vision. Recently, a few efforts have been made to bring VQA task to aerial images, due to its potential real-world applications in disaster monitoring, urban planning, and digital earth product generation. However, not only the huge variation in the appearance, scale and orientation of the concepts in aerial images, but also the scarcity of the well-annotated datasets restricts the development of VQA in this domain. In this paper, we introduce a new dataset, HRVQA, which provides collected 53512 aerial images of 1024*1024 pixels and semi-automatically generated 1070240 QA pairs. To benchmark the understanding capability of VQA models for aerial images, we evaluate the relevant methods on HRVQA. Moreover, we propose a novel model, GFTransformer, with gated attention modules and a mutual fusion module. The experiments show that the proposed dataset is quite challenging, especially the specific attribute related questions. Our method achieves superior performance in comparison to the previous state-of-the-art approaches. The dataset and the source code will be released at https://hrvqa.nl/.

READ FULL TEXT

page 1

page 5

page 8

page 10

page 12

page 15

page 16

research
08/10/2022

Aesthetic Visual Question Answering of Photographs

Aesthetic assessment of images can be categorized into two main forms: n...
research
08/05/2022

ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding

Visual question answering is an important task in both natural language ...
research
11/15/2022

MapQA: A Dataset for Question Answering on Choropleth Maps

Choropleth maps are a common visual representation for region-specific t...
research
01/10/2020

Visual Question Answering on 360° Images

In this work, we introduce VQA 360, a novel task of visual question answ...
research
10/28/2020

Leveraging Visual Question Answering to Improve Text-to-Image Synthesis

Generating images from textual descriptions has recently attracted a lot...
research
08/09/2017

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

This paper presents a state-of-the-art model for visual question answeri...
research
03/26/2018

Generalized Hadamard-Product Fusion Operators for Visual Question Answering

We propose a generalized class of multimodal fusion operators for the ta...

Please sign up or login with your details

Forgot password? Click here to reset