Learning Pixel-Level Distinctions for Video Highlight Detection

04/10/2022
by   Fanyue Wei, et al.
0

The goal of video highlight detection is to select the most attractive segments from a long video to depict the most interesting parts of the video. Existing methods typically focus on modeling relationship between different video segments in order to learning a model that can assign highlight scores to these segments; however, these approaches do not explicitly consider the contextual dependency within individual segments. To this end, we propose to learn pixel-level distinctions to improve the video highlight detection. This pixel-level distinction indicates whether or not each pixel in one video belongs to an interesting section. The advantages of modeling such fine-level distinctions are two-fold. First, it allows us to exploit the temporal and spatial relations of the content in one video, since the distinction of a pixel in one frame is highly dependent on both the content before this frame and the content around this pixel in this frame. Second, learning the pixel-level distinction also gives a good explanation to the video highlight task regarding what contents in a highlight segment will be attractive to people. We design an encoder-decoder network to estimate the pixel-level distinction, in which we leverage the 3D convolutional neural networks to exploit the temporal context information, and further take advantage of the visual saliency to model the spatial distinction. State-of-the-art performance on three public benchmarks clearly validates the effectiveness of our framework for video highlight detection.

READ FULL TEXT

page 1

page 4

page 7

page 8

research
09/08/2018

Video Smoke Detection Based on Deep Saliency Network

Video smoke detection is a promising fire detection method especially in...
research
06/25/2017

Decomposing Motion and Content for Natural Video Sequence Prediction

We propose a deep neural network for the prediction of future frames in ...
research
04/13/2018

MSnet: Mutual Suppression Network for Disentangled Video Representations

The extraction of meaningful features from videos is important as they c...
research
06/17/2021

Learning to Associate Every Segment for Video Panoptic Segmentation

Temporal correspondence - linking pixels or objects across frames - is a...
research
09/06/2021

The Animation Transformer: Visual Correspondence via Segment Matching

Visual correspondence is a fundamental building block on the way to buil...
research
07/19/2020

Adaptive Video Highlight Detection by Learning from User History

Recently, there is an increasing interest in highlight detection researc...
research
10/16/2020

Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation

Video generation models often operate under the assumption of fixed fram...

Please sign up or login with your details

Forgot password? Click here to reset