Learning Video Summarization Using Unpaired Data

05/30/2018
by   Mrigank Rochan, et al.
0

We consider the problem of video summarization. Given an input raw video, the goal is to select a small subset of key frames from the input video to create a shorter summary video that best describes the content of the original video. Most of the current state-of-the-art video summarization approaches use supervised learning and require training data in the form of paired videos. Each pair consists of a raw input video and a ground-truth summary video curated by human annotators. However, it is difficult to create such paired training data. To address this limitation, we propose a novel formulation to learn video summarization from unpaired data. Our method only requires training data in the form of two sets of videos: a set of raw videos (V) and another set of summary videos (S). We do not require the correspondence between these two sets of videos. We argue that this type of data is much easier to collect. We present an approach that learns to generate summary videos from such unpaired data. We achieve this by learning a mapping function F : V → S which tries to make the distribution of generated summary videos from F(V) similar to the distribution of S while enforcing a high visual diversity constraint on F(V). Experimental results on two benchmark datasets show that our proposed approach significantly outperforms other alternative methods.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset