Query-Dependent Video Representation for Moment Retrieval and Highlight Detection

03/24/2023
by   WonJun Moon, et al.
0

Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as the demand for video understanding is drastically increased. The key objective of MR/HD is to localize the moment and estimate clip-wise accordance level, i.e., saliency score, to the given text query. Although the recent transformer-based models brought some advances, we found that these methods do not fully exploit the information of a given query. For example, the relevance between text query and video contents is sometimes neglected when predicting the moment and its saliency. To tackle this issue, we introduce Query-Dependent DETR (QD-DETR), a detection transformer tailored for MR/HD. As we observe the insignificant role of a given query in transformer architectures, our encoding module starts with cross-attention layers to explicitly inject the context of text query into video representation. Then, to enhance the model's capability of exploiting the query information, we manipulate the video-query pairs to produce irrelevant pairs. Such negative (irrelevant) video-query pairs are trained to yield low saliency scores, which in turn, encourages the model to estimate precise accordance between query-video pairs. Lastly, we present an input-adaptive saliency predictor which adaptively defines the criterion of saliency scores for the given video-query pairs. Our extensive studies verify the importance of building the query-dependent representation for MR/HD. Specifically, QD-DETR outperforms state-of-the-art methods on QVHighlights, TVSum, and Charades-STA datasets. Codes are available at github.com/wjun0830/QD-DETR.

READ FULL TEXT

page 3

page 8

research
06/05/2023

Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval

Video moment retrieval (VMR) aims to identify the specific moment in an ...
research
07/20/2021

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Detecting customized moments and highlights from videos given natural la...
research
06/25/2021

Video Moment Retrieval with Text Query Considering Many-to-Many Correspondence Using Potentially Relevant Pair

In this paper we undertake the task of text-based video moment retrieval...
research
09/01/2020

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

The query-based moment retrieval is a problem of localising a specific c...
research
06/03/2021

Deconfounded Video Moment Retrieval with Causal Intervention

We tackle the task of video moment retrieval (VMR), which aims to locali...
research
08/19/2020

Generating Adjacency Matrix for Video-Query based Video Moment Retrieval

In this paper, we continue our work on Video-Query based Video Moment re...
research
07/20/2020

Graph Neural Network for Video-Query based Video Moment Retrieval

In this paper, we focus on Video Query based Video Moment Retrieval (VQ-...

Please sign up or login with your details

Forgot password? Click here to reset