Fill-in-the-blank as a Challenging Video Understanding Evaluation Framework

04/09/2021
by   Santiago Castro, et al.
1

Work to date on language-informed video understanding has primarily addressed two tasks: (1) video question answering using multiple-choice questions, where models perform relatively well because they exploit the fact that candidate answers are readily available; and (2) video captioning, which relies on an open-ended evaluation framework that is often inaccurate because system answers may be perceived as incorrect if they differ in form from the ground truth. In this paper, we propose fill-in-the-blanks as a video understanding evaluation framework that addresses these previous evaluation drawbacks, and more closely reflects real-life settings where no multiple choices are given. The task tests a system understanding of a video by requiring the model to predict a masked noun phrase in the caption of the video, given the video and the surrounding text. We introduce a novel dataset consisting of 28,000 videos and fill-in-the-blank tests. We show that both a multimodal model and a strong language model have a large gap with human performance, thus suggesting that the task is more challenging than current video understanding benchmarks.

READ FULL TEXT

page 1

page 3

page 5

page 9

page 11

research
12/09/2015

MovieQA: Understanding Stories in Movies through Question-Answering

We introduce the MovieQA dataset which aims to evaluate automatic story ...
research
09/14/2022

WildQA: In-the-Wild Video Question Answering

Existing video understanding datasets mostly focus on human interactions...
research
06/08/2021

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

Most existing video-and-language (VidL) research focuses on a single dat...
research
08/18/2023

Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models

Video Question Answering (VideoQA) is a challenging task that entails co...
research
01/30/2022

A Dataset for Medical Instructional Video Classification and Question Answering

This paper introduces a new challenge and datasets to foster research to...
research
08/21/2023

Simple Baselines for Interactive Video Retrieval with Questions and Answers

To date, the majority of video retrieval systems have been optimized for...
research
11/23/2016

A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering

While deep convolutional neural networks frequently approach or exceed h...

Please sign up or login with your details

Forgot password? Click here to reset