DramaQA: Character-Centered Video Story Understanding with Hierarchical QA

05/07/2020
by   Seongho Choi, et al.
1

Despite recent progress on computer vision and natural language processing, developing video understanding intelligence is still hard to achieve due to the intrinsic difficulty of story in video. Moreover, there is not a theoretical metric for evaluating the degree of video understanding. In this paper, we propose a novel video question answering (Video QA) task, DramaQA, for a comprehensive understanding of the video story. The DramaQA focused on two perspectives: 1) hierarchical QAs as an evaluation metric based on the cognitive developmental stages of human intelligence. 2) character-centered video annotations to model local coherence of the story. Our dataset is built upon the TV drama "Another Miss Oh" and it contains 16,191 QA pairs from 23,928 various length video clips, with each QA pair belonging to one of four difficulty levels. We provide 217,308 annotated images with rich character-centered annotations, including visual bounding boxes, behaviors, and emotions of main characters, and coreference resolved scripts. Additionally, we provide analyses of the dataset as well as Dual Matching Multistream model which effectively learns character-centered representations of video to answer questions about the video. We are planning to release our dataset and model publicly for research purposes and expect that our work will provide a new perspective on video story understanding research.

READ FULL TEXT

page 4

page 7

page 8

page 20

page 21

research
10/27/2020

Co-attentional Transformers for Story-Based Video Understanding

Inspired by recent trends in vision and language learning, we explore ap...
research
04/01/2019

Constructing Hierarchical Q A Datasets for Video Story Understanding

Video understanding is emerging as a new paradigm for studying human-lik...
research
07/21/2021

CogME: A Novel Evaluation Metric for Video Understanding Intelligence

Developing video understanding intelligence is quite challenging because...
research
07/04/2017

DeepStory: Video Story QA by Deep Embedded Memory Networks

Question-answering (QA) on video contents is a significant challenge for...
research
12/06/2015

A Restricted Visual Turing Test for Deep Scene and Event Understanding

This paper presents a restricted visual Turing test (VTT) for story-line...
research
05/16/2023

A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot

Multimedia content, such as advertisements and story videos, exhibit a r...
research
04/13/2016

Visual Storytelling

We introduce the first dataset for sequential vision-to-language, and ex...

Please sign up or login with your details

Forgot password? Click here to reset