Multi-query Video Retrieval

01/10/2022
by   Zeyu Wang, et al.
0

Retrieving target videos based on text descriptions is a task of great practical value and has received increasing attention over the past few years. In this paper, we focus on the less-studied setting of multi-query video retrieval, where multiple queries are provided to the model for searching over the video archive. We first show that the multi-query retrieval task is more pragmatic and representative of real-world use cases and better evaluates retrieval capabilities of current models, thereby deserving of further investigation alongside the more prevalent single-query retrieval setup. We then propose several new methods for leveraging multiple queries at training time to improve over simply combining similarity outputs of multiple queries from regular single-query trained models. Our models consistently outperform several competitive baselines over three different datasets. For instance, Recall@1 can be improved by 4.7 points on MSR-VTT, 4.1 points on MSVD and 11.7 points on VATEX over a strong baseline built on the state-of-the-art CLIP4Clip model. We believe further modeling efforts will bring new insights to this direction and spark new systems that perform better in real-world video retrieval applications. Code is available at https://github.com/princetonvisualai/MQVR.

READ FULL TEXT

page 12

page 13

research
08/22/2023

Multi-event Video-Text Retrieval

Video-Text Retrieval (VTR) is a crucial multi-modal task in an era of ma...
research
03/18/2021

On Semantic Similarity in Video Retrieval

Current video retrieval efforts all found their evaluation on an instanc...
research
03/17/2023

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

Existing text-video retrieval solutions are, in essence, discriminant mo...
research
04/28/2023

Click-Feedback Retrieval

Retrieving target information based on input query is of fundamental imp...
research
04/27/2022

Relevance-based Margin for Contrastively-trained Video Retrieval Models

Video retrieval using natural language queries has attracted increasing ...
research
09/30/2020

Encode the Unseen: Predictive Video Hashing for Scalable Mid-Stream Retrieval

This paper tackles a new problem in computer vision: mid-stream video-to...
research
10/07/2019

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels

Many real-world video analysis applications require the ability to ident...

Please sign up or login with your details

Forgot password? Click here to reset