VideoSET: Video Summary Evaluation through Text

by   Serena Yeung, et al.

In this paper we present VideoSET, a method for Video Summary Evaluation through Text that can evaluate how well a video summary is able to retain the semantic information contained in its original video. We observe that semantics is most easily expressed in words, and develop a text-based approach for the evaluation. Given a video summary, a text representation of the video summary is first generated, and an NLP-based metric is then used to measure its semantic distance to ground-truth text summaries written by humans. We show that our technique has higher agreement with human judgment than pixel-based distance metrics. We also release text annotations and ground-truth text summaries for a number of publicly available video datasets, for use by the computer vision community.


page 2

page 6

page 9

page 14


Textually Customized Video Summaries

The best summary of a long video differs among different people due to i...

Multi-Narrative Semantic Overlap Task: Evaluation and Benchmark

In this paper, we introduce an important yet relatively unexplored NLP t...

On the Evaluation of Video Keyframe Summaries using User Ground Truth

Given the great interest in creating keyframe summaries from video, it i...

A Framework towards Domain Specific Video Summarization

In the light of exponentially increasing video content, video summarizat...

Bipartite Graph Matching for Keyframe Summary Evaluation

A keyframe summary, or "static storyboard", is a collection of frames fr...

Rethinking the Evaluation of Video Summaries

Video summarization is a technique to create a short skim of the origina...

TexT - Text Extractor Tool for Handwritten Document Transcription and Annotation

This paper presents a framework for semi-automatic transcription of larg...