Visual Subtitle Feature Enhanced Video Outline Generation

08/24/2022
by   Qi Lv, et al.
3

With the tremendously increasing number of videos, there is a great demand for techniques that help people quickly navigate to the video segments they are interested in. However, current works on video understanding mainly focus on video content summarization, while little effort has been made to explore the structure of a video. Inspired by textual outline generation, we introduce a novel video understanding task, namely video outline generation (VOG). This task is defined to contain two sub-tasks: (1) first segmenting the video according to the content structure and then (2) generating a heading for each segment. To learn and evaluate VOG, we annotate a 10k+ dataset, called DuVOG. Specifically, we use OCR tools to recognize subtitles of videos. Then annotators are asked to divide subtitles into chapters and title each chapter. In videos, highlighted text tends to be the headline since it is more likely to attract attention. Therefore we propose a Visual Subtitle feature Enhanced video outline generation model (VSENet) which takes as input the textual subtitles together with their visual font sizes and positions. We consider the VOG task as a sequence tagging problem that extracts spans where the headings are located and then rewrites them to form the final outlines. Furthermore, based on the similarity between video outlines and textual outlines, we use a large number of articles with chapter headings to pretrain our model. Experiments on DuVOG show that our model largely outperforms other baseline methods, achieving 77.1 of F1-score for the video segmentation level and 85.0 of ROUGE-L_F0.5 for the headline generation level.

READ FULL TEXT

page 1

page 2

page 6

research
12/02/2022

Role of Audio in Audio-Visual Video Summarization

Video summarization attracts attention for efficient video representatio...
research
03/31/2020

Straight to the Point: Fast-forwarding Videos via Reinforcement Learning Using Textual Data

The rapid increase in the amount of published visual data and the limite...
research
01/09/2023

Cursive Caption Text Detection in Videos

Textual content appearing in videos represents an interesting index for ...
research
09/26/2022

Multi-modal Video Chapter Generation

Chapter generation becomes practical technique for online videos nowaday...
research
11/09/2020

Chapter Captor: Text Segmentation in Novels

Books are typically segmented into chapters and sections, representing c...
research
10/12/2022

LiveSeg: Unsupervised Multimodal Temporal Segmentation of Long Livestream Videos

Livestream videos have become a significant part of online learning, whe...
research
01/17/2021

Narration Generation for Cartoon Videos

Research on text generation from multimodal inputs has largely focused o...

Please sign up or login with your details

Forgot password? Click here to reset