NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions

05/18/2021
by   Junbin Xiao, et al.
13

We introduce NExT-QA, a rigorously designed video question answering (VideoQA) benchmark to advance video understanding from describing to explaining the temporal actions. Based on the dataset, we set up multi-choice and open-ended QA tasks targeting causal action reasoning, temporal action reasoning, and common scene comprehension. Through extensive analysis of baselines and established VideoQA techniques, we find that top-performing methods excel at shallow scene descriptions but are weak in causal and temporal action reasoning. Furthermore, the models that are effective on multi-choice QA, when adapted to open-ended QA, still struggle in generalizing the answers. This raises doubt on the ability of these models to reason and highlights possibilities for improvement. With detailed results for different question types and heuristic observations for future works, we hope NExT-QA will guide the next generation of VQA research to go beyond superficial scene description towards a deeper understanding of videos. (The dataset and related resources are available at https://github.com/doc-doc/NExT-QA.git)

READ FULL TEXT

page 4

page 8

page 11

page 14

page 15

research
05/30/2022

From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

Video understanding has achieved great success in representation learnin...
research
08/18/2021

MeDiaQA: A Question Answering Dataset on Medical Dialogues

In this paper, we introduce MeDiaQA, a novel question answering(QA) data...
research
06/26/2023

FunQA: Towards Surprising Video Comprehension

Surprising videos, e.g., funny clips, creative performances, or visual i...
research
12/02/2018

How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos

Understanding web instructional videos is an essential branch of video u...
research
08/13/2021

A Dataset for Answering Time-Sensitive Questions

Time is an important dimension in our physical world. Lots of facts can ...
research
10/08/2022

EgoTaskQA: Understanding Human Tasks in Egocentric Videos

Understanding human tasks through video observations is an essential cap...
research
05/22/2022

Interpretable Proof Generation via Iterative Backward Reasoning

We present IBR, an Iterative Backward Reasoning model to solve the proof...

Please sign up or login with your details

Forgot password? Click here to reset