Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization

11/18/2022
by   Zongshang Pang, et al.
0

Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Unsupervised methods usually rely on heuristic training objectives such as diversity and representativeness. However, such methods need to bootstrap the online-generated summaries to compute the objectives for importance score regression. We consider such a pipeline inefficient and seek to directly quantify the frame-level importance with the help of contrastive losses in the representation learning literature. Leveraging the contrastive losses, we propose three metrics featuring a desirable key frame: local dissimilarity, global consistency, and uniqueness. With features pre-trained on the image classification task, the metrics can already yield high-quality importance scores, demonstrating competitive or better performance than past heavily-trained methods. We show that by refining the pre-trained features with a lightweight contrastively learned projection module, the frame-level importance scores can be further improved, and the model can also leverage a large number of random videos and generalize to test videos with decent performance. Code available at https://github.com/pangzss/pytorch-CTVSUM.

READ FULL TEXT

page 3

page 8

page 20

research
03/21/2023

Visual Representation Learning from Unlabeled Video using Contrastive Masked Autoencoders

Masked Autoencoders (MAEs) learn self-supervised representations by rand...
research
08/06/2022

Frozen CLIP Models are Efficient Video Learners

Video recognition has been dominated by the end-to-end learning paradigm...
research
03/22/2023

Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

Sequential video understanding, as an emerging video understanding task,...
research
03/13/2023

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

The goal of multimodal summarization is to extract the most important in...
research
01/12/2023

Learning to Summarize Videos by Contrasting Clips

Video summarization aims at choosing parts of a video that narrate a sto...
research
03/28/2023

SELF-VS: Self-supervised Encoding Learning For Video Summarization

Despite its wide range of applications, video summarization is still hel...
research
08/23/2017

CNN-Based Prediction of Frame-Level Shot Importance for Video Summarization

In the Internet, ubiquitous presence of redundant, unedited, raw videos ...

Please sign up or login with your details

Forgot password? Click here to reset