VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering

08/14/2019
by   Cătălina Cangea, et al.
2

Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely on its egocentric input to answer a given question. The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world. However, initial advancements combining standard vision and language methods with imitation and reinforcement learning algorithms have shown EQA might be too complex and challenging for these techniques. In order to investigate the feasibility of EQA-type tasks, we build the VideoNavQA dataset that contains pairs of questions and videos generated in the House3D environment. The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task. We investigate several models, adapted from popular VQA methods, on this new benchmark. This establishes an initial understanding of how well VQA-style methods can perform within this novel EQA paradigm.

READ FULL TEXT

page 1

page 2

page 4

page 5

page 7

page 9

research
04/17/2020

Knowledge-Based Visual Question Answering in Videos

We propose a novel video understanding task by fusing knowledge-based an...
research
11/30/2017

Embodied Question Answering

We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- ...
research
04/09/2019

Multi-Target Embodied Question Answering

Embodied Question Answering (EQA) is a relatively new task where an agen...
research
08/17/2019

What is needed for simple spatial language capabilities in VQA?

Visual question answering (VQA) comprises a variety of language capabili...
research
02/04/2022

Interactive Mobile App Navigation with Uncertain or Under-specified Natural Language Commands

We introduce Mobile app Tasks with Iterative Feedback (MoTIF), a new dat...
research
10/16/2021

Explore before Moving: A Feasible Path Estimation and Memory Recalling Framework for Embodied Navigation

An embodied task such as embodied question answering (EmbodiedQA), requi...
research
07/16/2022

Scene Graph for Embodied Exploration in Cluttered Scenario

The ability to handle objects in cluttered environment has been long ant...

Please sign up or login with your details

Forgot password? Click here to reset