How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos

12/02/2018
by   Shaojie Wang, et al.
0

Understanding web instructional videos is an essential branch of video understanding in two aspects. First, most existing video methods focus on short-term actions for a-few-second-long video clips; these methods are not directly applicable to long videos. Second, unlike unconstrained long videos, e.g., movies, instructional videos are more structured in that they have step-by-step procedure constraining the understanding task. In this paper, we study reasoning on instructional videos via question-answering (QA). Surprisingly, it has not been an emphasis in the video community despite its rich applications. We thereby introduce YouQuek, an annotated QA dataset for instructional videos based on the recent YouCook2 Youcook. The questions in YouQuek are not limited to cues on one frame but related to logical reasoning in the temporal dimension. Observing the lack of effective representations for modeling long videos, we propose a set of carefully designed models including a novel Recurrent Graph Convolutional Network (RGCN) that captures both temporal order and relation information. Furthermore, we study multiple modalities including description and transcripts for the purpose of boosting video understanding. Extensive experiments on YouQuek suggest that RGCN performs the best in terms of QA accuracy and a better performance is gained by introducing human annotated description.

READ FULL TEXT
research
06/06/2019

ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering

Recent developments in modeling language and vision have been successful...
research
06/26/2023

FunQA: Towards Surprising Video Comprehension

Surprising videos, e.g., funny clips, creative performances, or visual i...
research
05/18/2021

NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions

We introduce NExT-QA, a rigorously designed video question answering (Vi...
research
12/06/2016

MarioQA: Answering Questions by Watching Gameplay Videos

We present a framework to analyze various aspects of models for video qu...
research
03/22/2021

Extracting the Unknown from Long Math Problems

In problem solving, understanding the problem that one seeks to solve is...
research
08/07/2020

Location-aware Graph Convolutional Networks for Video Question Answering

We addressed the challenging task of video question answering, which req...
research
08/13/2021

A Dataset for Answering Time-Sensitive Questions

Time is an important dimension in our physical world. Lots of facts can ...

Please sign up or login with your details

Forgot password? Click here to reset