Shot Contrastive Self-Supervised Learning for Scene Boundary Detection

04/28/2021
by   Shixing Chen, et al.
0

Scenes play a crucial role in breaking the storyline of movies and TV episodes into semantically cohesive parts. However, given their complex temporal structure, finding scene boundaries can be a challenging task requiring large amounts of labeled training data. To address this challenge, we present a self-supervised shot contrastive learning approach (ShotCoL) to learn a shot representation that maximizes the similarity between nearby shots compared to randomly selected shots. We show how to apply our learned shot representation for the task of scene boundary detection to offer state-of-the-art performance on the MovieNet dataset while requiring only  25 of the training labels, using 9x fewer model parameters and offering 7x faster runtime. To assess the effectiveness of ShotCoL on novel applications of scene boundary detection, we take on the problem of finding timestamps in movies and TV episodes where video-ads can be inserted while offering a minimally disruptive viewing experience. To this end, we collected a new dataset called AdCuepoints with 3,975 movies and TV episodes, 2.2 million shots and 19,119 minimally disruptive ad cue-point labels. We present a thorough empirical analysis on this dataset demonstrating the effectiveness of ShotCoL for ad cue-points detection.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 14

page 16

research
05/11/2022

Scene Consistency Representation Learning for Video Scene Segmentation

A long-term video, such as a movie or TV show, is composed of various sc...
research
01/14/2022

Boundary-aware Self-supervised Learning for Video Scene Segmentation

Self-supervised learning has drawn attention through its effectiveness i...
research
06/16/2021

Watching Too Much Television is Good: Self-Supervised Audio-Visual Representation Learning from Movies and TV Shows

The abundance and ease of utilizing sound, along with the fact that audi...
research
02/22/2022

Movies2Scenes: Learning Scene Representations Using Movie Similarities

Automatic understanding of movie-scenes is an important problem with mul...
research
06/21/2022

Few-Max: Few-Shot Domain Adaptation for Unsupervised Contrastive Representation Learning

Contrastive self-supervised learning methods learn to map data points su...
research
11/30/2021

FROB: Few-shot ROBust Model for Classification and Out-of-Distribution Detection

Nowadays, classification and Out-of-Distribution (OoD) detection in the ...
research
02/17/2020

Serial Speakers: a Dataset of TV Series

For over a decade, TV series have been drawing increasing interest, both...

Please sign up or login with your details

Forgot password? Click here to reset