Query-based Video Summarization with Pseudo Label Supervision

07/04/2023
by   Jia-Hong Huang, et al.
0

Existing datasets for manually labelled query-based video summarization are costly and thus small, limiting the performance of supervised deep video summarization models. Self-supervision can address the data sparsity challenge by using a pretext task and defining a method to acquire extra data with pseudo labels to pre-train a supervised deep model. In this work, we introduce segment-level pseudo labels from input videos to properly model both the relationship between a pretext task and a target task, and the implicit relationship between the pseudo label and the human-defined label. The pseudo labels are generated based on existing human-defined frame-level labels. To create more accurate query-dependent video summaries, a semantics booster is proposed to generate context-aware query representations. Furthermore, we propose mutual attention to help capture the interactive information between visual and textual modalities. Three commonly-used video summarization benchmarks are used to thoroughly validate the proposed approach. Experimental results show that the proposed video summarization algorithm achieves state-of-the-art performance.

READ FULL TEXT
research
03/04/2023

Improving Audio-Visual Video Parsing with Pseudo Visual Labels

Audio-Visual Video Parsing is a task to predict the events that occur in...
research
04/07/2020

Query-controllable Video Summarization

When video collections become huge, how to explore both within and acros...
research
08/14/2022

TL;DW? Summarizing Instructional Videos with Task Relevance Cross-Modal Saliency

YouTube users looking for instructions for a specific task may spend a l...
research
05/01/2017

Query-adaptive Video Summarization via Quality-aware Relevance Estimation

Although the problem of automatic video summarization has recently recei...
research
07/08/2021

Use of Affective Visual Information for Summarization of Human-Centric Videos

Increasing volume of user-generated human-centric video content and thei...
research
10/06/2022

Learning functional sections in medical conversations: iterative pseudo-labeling and human-in-the-loop approach

Medical conversations between patients and medical professionals have im...
research
01/27/2021

Efficient Video Summarization Framework using EEG and Eye-tracking Signals

This paper proposes an efficient video summarization framework that will...

Please sign up or login with your details

Forgot password? Click here to reset