Knowledge-Based Visual Question Answering in Videos

04/17/2020
by   Noa Garcia, et al.
12

We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the series to be answered. Second, we propose a video understanding model by combining the visual and textual video content with specific knowledge about the show. Our main findings are: (i) the incorporation of knowledge produces outstanding improvements for VQA in video, and (ii) the performance on KnowIT VQA still lags well behind human accuracy, indicating its usefulness for studying current video modelling limitations.

READ FULL TEXT

page 1

page 2

research
10/23/2019

KnowIT VQA: Answering Knowledge-Based Questions about Videos

We propose a novel video understanding task by fusing knowledge-based an...
research
11/10/2022

Watching the News: Towards VideoQA Models that can Read

Video Question Answering methods focus on commonsense reasoning and visu...
research
08/14/2019

VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering

Embodied Question Answering (EQA) is a recently proposed task, where an ...
research
11/17/2021

Achieving Human Parity on Visual Question Answering

The Visual Question Answering (VQA) task utilizes both visual image and ...
research
10/08/2020

Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset

Modern social intelligence includes the ability to watch videos and answ...
research
12/18/2020

On Modality Bias in the TVQA Dataset

TVQA is a large scale video question answering (video-QA) dataset based ...
research
11/23/2016

A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering

While deep convolutional neural networks frequently approach or exceed h...

Please sign up or login with your details

Forgot password? Click here to reset