DeepAI AI Chat
Log In Sign Up

Visual Question Answering on 360° Images

01/10/2020
by   Shih-Han Chou, et al.
15

In this work, we introduce VQA 360, a novel task of visual question answering on 360 images. Unlike a normal field-of-view image, a 360 image captures the entire visual content around the optical center of a camera, demanding more sophisticated spatial understanding and reasoning. To address this problem, we collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answer triplets for a variety of question types. We then study two different VQA models on VQA 360, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360 image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions. We demonstrate that the cubemap-based model with multi-level fusion and attention diffusion performs favorably against other variants and the equirectangular-based models. Nevertheless, the gap between the humans' and machines' performance reveals the need for more advanced VQA 360 algorithms. We, therefore, expect our dataset and studies to serve as the benchmark for future development in this challenging task. Dataset, code, and pre-trained models are available online.

READ FULL TEXT

page 1

page 2

page 8

page 10

page 12

page 13

03/01/2019

Answer Them All! Toward Universal Visual Question Answering Models

Visual Question Answering (VQA) research is split into two camps: the fi...
10/26/2022

What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?

In visual question answering (VQA), a machine must answer a question giv...
09/19/2017

Exploring Human-like Attention Supervision in Visual Question Answering

Attention mechanisms have been widely applied in the Visual Question Ans...
03/16/2018

Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

In recent years, visual question answering (VQA) has become topical. The...
01/23/2023

HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images

Visual question answering (VQA) is an important and challenging multimod...
08/17/2019

What is needed for simple spatial language capabilities in VQA?

Visual question answering (VQA) comprises a variety of language capabili...