-
Temporal Reasoning via Audio Question Answering
Multimodal question answering tasks can be used as proxy tasks to study ...
read it
-
TutorialVQA: Question Answering Dataset for Tutorial Videos
Despite the number of currently available datasets on video question ans...
read it
-
How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos
Understanding web instructional videos is an essential branch of video u...
read it
-
HySTER: A Hybrid Spatio-Temporal Event Reasoner
The task of Video Question Answering (VideoQA) consists in answering nat...
read it
-
Video based Contextual Question Answering
The primary aim of this project is to build a contextual Question-Answer...
read it
-
Automatic Understanding of Image and Video Advertisements
There is more to images than their objective physical content: for examp...
read it
-
Constructing Hierarchical Q A Datasets for Video Story Understanding
Video understanding is emerging as a new paradigm for studying human-lik...
read it
MarioQA: Answering Questions by Watching Gameplay Videos
We present a framework to analyze various aspects of models for video question answering (VideoQA) using customizable synthetic datasets, which are constructed automatically from gameplay videos. Our work is motivated by the fact that existing models are often tested only on datasets that require excessively high-level reasoning or mostly contain instances accessible through single frame inferences. Hence, it is difficult to measure capacity and flexibility of trained models, and existing techniques often rely on ad-hoc implementations of deep neural networks without clear insight into datasets and models. We are particularly interested in understanding temporal relationships between video events to solve VideoQA problems; this is because reasoning temporal dependency is one of the most distinct components in videos from images. To address this objective, we automatically generate a customized synthetic VideoQA dataset using Super Mario Bros. gameplay videos so that it contains events with different levels of reasoning complexity. Using the dataset, we show that properly constructed datasets with events in various complexity levels are critical to learn effective models and improve overall performance.
READ FULL TEXT
Comments
There are no comments yet.