KnowIT VQA: Answering Knowledge-Based Questions about Videos

10/23/2019
by   Noa Garcia, et al.
9

We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the series to be answered. Second, we propose a video understanding model by combining the visual and textual video content with specific knowledge about the show. Our main findings are: (i) the incorporation of knowledge produces outstanding improvements for VQA in video, and (ii) the performance on KnowIT VQA still lags well behind human accuracy, indicating its usefulness for studying current video modelling limitations.

READ FULL TEXT

page 1

page 5

page 7

page 10

page 12

page 14

research
04/17/2020

Knowledge-Based Visual Question Answering in Videos

We propose a novel video understanding task by fusing knowledge-based an...
research
11/07/2022

CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering

Videos often capture objects, their visible properties, their motion, an...
research
02/08/2022

NEWSKVQA: Knowledge-Aware News Video Question Answering

Answering questions in the context of videos can be helpful in video ind...
research
11/17/2021

Achieving Human Parity on Visual Question Answering

The Visual Question Answering (VQA) task utilizes both visual image and ...
research
10/08/2020

Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset

Modern social intelligence includes the ability to watch videos and answ...
research
09/04/2023

Understanding Video Scenes through Text: Insights from Text-based Video Question Answering

Researchers have extensively studied the field of vision and language, d...
research
12/18/2020

On Modality Bias in the TVQA Dataset

TVQA is a large scale video question answering (video-QA) dataset based ...

Please sign up or login with your details

Forgot password? Click here to reset