MarioQA: Answering Questions by Watching Gameplay Videos

12/06/2016
by   Jonghwan Mun, et al.
0

We present a framework to analyze various aspects of models for video question answering (VideoQA) using customizable synthetic datasets, which are constructed automatically from gameplay videos. Our work is motivated by the fact that existing models are often tested only on datasets that require excessively high-level reasoning or mostly contain instances accessible through single frame inferences. Hence, it is difficult to measure capacity and flexibility of trained models, and existing techniques often rely on ad-hoc implementations of deep neural networks without clear insight into datasets and models. We are particularly interested in understanding temporal relationships between video events to solve VideoQA problems; this is because reasoning temporal dependency is one of the most distinct components in videos from images. To address this objective, we automatically generate a customized synthetic VideoQA dataset using Super Mario Bros. gameplay videos so that it contains events with different levels of reasoning complexity. Using the dataset, we show that properly constructed datasets with events in various complexity levels are critical to learn effective models and improve overall performance.

READ FULL TEXT

page 3

page 5

page 8

research
11/21/2019

Temporal Reasoning via Audio Question Answering

Multimodal question answering tasks can be used as proxy tasks to study ...
research
09/04/2023

Understanding Video Scenes through Text: Insights from Text-based Video Question Answering

Researchers have extensively studied the field of vision and language, d...
research
12/02/2018

How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos

Understanding web instructional videos is an essential branch of video u...
research
01/17/2021

HySTER: A Hybrid Spatio-Temporal Event Reasoner

The task of Video Question Answering (VideoQA) consists in answering nat...
research
04/19/2018

Video based Contextual Question Answering

The primary aim of this project is to build a contextual Question-Answer...
research
07/10/2017

Automatic Understanding of Image and Video Advertisements

There is more to images than their objective physical content: for examp...
research
03/08/2016

Learning to Blend Computer Game Levels

We present an approach to generate novel computer game levels that blend...

Please sign up or login with your details

Forgot password? Click here to reset