SQA3D: Situated Question Answering in 3D Scenes

10/14/2022
by   Xiaojian Ma, et al.
13

We propose a new task to benchmark scene understanding of embodied agents: Situated Question Answering in 3D Scenes (SQA3D). Given a scene context (e.g., 3D scan), SQA3D requires the tested agent to first understand its situation (position, orientation, etc.) in the 3D scene as described by text, then reason about its surrounding environment and answer a question under that situation. Based upon 650 scenes from ScanNet, we provide a dataset centered around 6.8k unique situations, along with 20.4k descriptions and 33.4k diverse reasoning questions for these situations. These questions examine a wide spectrum of reasoning capabilities for an intelligent agent, ranging from spatial relation comprehension to commonsense understanding, navigation, and multi-hop reasoning. SQA3D imposes a significant challenge to current multi-modal especially 3D reasoning models. We evaluate various state-of-the-art approaches and find that the best one only achieves an overall score of 47.20 amateur human participants can reach 90.06 future embodied AI research with stronger situation understanding and reasoning capability.

READ FULL TEXT

page 2

page 4

page 5

page 16

page 17

page 18

page 19

page 20

research
09/16/2021

Knowledge-based Embodied Question Answering

In this paper, we propose a novel Knowledge-based Embodied Question Answ...
research
07/13/2021

Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering

Visual Question Answering (VQA) is concerned with answering free-form qu...
research
03/05/2023

VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning

The ideal form of Visual Question Answering requires understanding, grou...
research
06/01/2022

SAMPLE-HD: Simultaneous Action and Motion Planning Learning Environment

Humans exhibit incredibly high levels of multi-modal understanding - com...
research
05/03/2023

Contextual Reasoning for Scene Generation (Technical Report)

We present a continuation to our previous work, in which we developed th...
research
07/17/2020

Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions

To understand movies, humans constantly reason over the dialogues and ac...
research
02/28/2019

From Visual to Acoustic Question Answering

We introduce the new task of Acoustic Question Answering (AQA) to promot...

Please sign up or login with your details

Forgot password? Click here to reset