FunQA: Towards Surprising Video Comprehension

06/26/2023
by   Binzhu Xie, et al.
0

Surprising videos, e.g., funny clips, creative performances, or visual illusions, attract significant attention. Enjoyment of these videos is not simply a response to visual stimuli; rather, it hinges on the human capacity to understand (and appreciate) commonsense violations depicted in these videos. We introduce FunQA, a challenging video question answering (QA) dataset specifically designed to evaluate and enhance the depth of video reasoning based on counter-intuitive and fun videos. Unlike most video QA benchmarks which focus on less surprising contexts, e.g., cooking or instructional videos, FunQA covers three previously unexplored types of surprising videos: 1) HumorQA, 2) CreativeQA, and 3) MagicQA. For each subset, we establish rigorous QA tasks designed to assess the model's capability in counter-intuitive timestamp localization, detailed video description, and reasoning around counter-intuitiveness. We also pose higher-level tasks, such as attributing a fitting and vivid title to the video, and scoring the video creativity. In total, the FunQA benchmark consists of 312K free-text QA pairs derived from 4.3K video clips, spanning a total of 24 video hours. Extensive experiments with existing VideoQA models reveal significant performance gaps for the FunQA videos across spatial-temporal reasoning, visual-centered reasoning, and free-text generation.

READ FULL TEXT

page 1

page 9

page 15

page 16

page 24

page 25

research
12/02/2018

How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos

Understanding web instructional videos is an essential branch of video u...
research
05/18/2021

NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions

We introduce NExT-QA, a rigorously designed video question answering (Vi...
research
11/10/2022

Watching the News: Towards VideoQA Models that can Read

Video Question Answering methods focus on commonsense reasoning and visu...
research
03/29/2021

TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

Traffic event cognition and reasoning in videos is an important task tha...
research
07/10/2021

DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering

Video question answering is a challenging task, which requires agents to...
research
10/20/2020

BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues

Video-grounded dialogues are very challenging due to (i) the complexity ...
research
10/08/2022

EgoTaskQA: Understanding Human Tasks in Egocentric Videos

Understanding human tasks through video observations is an essential cap...

Please sign up or login with your details

Forgot password? Click here to reset