TVQA+: Spatio-Temporal Grounding for Video Question Answering

04/25/2019
by   Jie Lei, et al.
0

We present the task of Spatio-Temporal Video Question Answering, which requires intelligent systems to simultaneously retrieve relevant moments and detect referenced visual concepts (people and objects) to answer natural language questions about videos. We first augment the TVQA dataset with 310.8k bounding boxes, linking depicted objects to visual concepts in questions and answers. We name this augmented version as TVQA+. We then propose Spatio-Temporal Answerer with Grounded Evidence (STAGE), a unified framework that grounds evidence in both the spatial and temporal domains to answer questions about videos. Comprehensive experiments and analyses demonstrate the effectiveness of our framework and how the rich annotations in our TVQA+ dataset can contribute to the question answering task. As a side product, by performing this joint task, our model is able to produce more insightful intermediate results. Dataset and code are publicly available.

READ FULL TEXT

page 1

page 3

page 4

page 8

page 9

page 10

page 11

research
03/26/2022

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

In this paper, we focus on the Audio-Visual Question Answering (AVQA) ta...
research
05/03/2022

Episodic Memory Question Answering

Egocentric augmented reality devices such as wearable glasses passively ...
research
01/17/2021

HySTER: A Hybrid Spatio-Temporal Event Reasoner

The task of Video Question Answering (VideoQA) consists in answering nat...
research
07/22/2023

Discovering Spatio-Temporal Rationales for Video Question Answering

This paper strives to solve complex video question answering (VideoQA) w...
research
10/19/2022

Grounded Video Situation Recognition

Dense video understanding requires answering several questions such as w...
research
05/13/2020

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

Videos convey rich information. Dynamic spatio-temporal relationships be...
research
03/13/2019

Natural Language Interaction with Explainable AI Models

This paper presents an explainable AI (XAI) system that provides explana...

Please sign up or login with your details

Forgot password? Click here to reset