Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions

07/17/2020
by   Noa Garcia, et al.
14

To understand movies, humans constantly reason over the dialogues and actions shown in specific scenes and relate them to the overall storyline already seen. Inspired by this behaviour, we design ROLL, a model for knowledge-based video story question answering that leverages three crucial aspects of movie understanding: dialog comprehension, scene reasoning, and storyline recalling. In ROLL, each of these tasks is in charge of extracting rich and diverse information by 1) processing scene dialogues, 2) generating unsupervised video scene descriptions, and 3) obtaining external knowledge in a weakly supervised fashion. To answer a given question correctly, the information generated by each inspired-cognitive task is encoded via Transformers and fused through a modality weighting mechanism, which balances the information from the different sources. Exhaustive evaluation demonstrates the effectiveness of our approach, which yields a new state-of-the-art on two challenging video question answering datasets: KnowIT VQA and TVQA+.

READ FULL TEXT

page 6

page 14

page 19

research
03/26/2021

On the hidden treasure of dialog in video question answering

High-level understanding of stories in video such as movies and TV shows...
research
09/11/2018

Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions

In-depth scene descriptions and question answering tasks have greatly in...
research
07/31/2019

Learning Question-Guided Video Representation for Multi-Turn Video Question Answering

Understanding and conversing about dynamic scenes is one of the key capa...
research
10/14/2022

SQA3D: Situated Question Answering in 3D Scenes

We propose a new task to benchmark scene understanding of embodied agent...
research
04/18/2019

Progressive Attention Memory Network for Movie Story Question Answering

This paper proposes the progressive attention memory network (PAMN) for ...
research
01/15/2021

Recent Advances in Video Question Answering: A Review of Datasets and Methods

Video Question Answering (VQA) is a recent emerging challenging task in ...
research
10/03/2018

Transfer Learning via Unsupervised Task Discovery for Visual Question Answering

We study how to leverage off-the-shelf visual and linguistic data to cop...

Please sign up or login with your details

Forgot password? Click here to reset