ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation

10/11/2021
by   Aiden Seungjoon Lee, et al.
0

Video-text retrieval has many real-world applications such as media analytics, surveillance, and robotics. This paper presents the 1st place solution to the video retrieval track of the ICCV VALUE Challenge 2021. We present a simple yet effective approach to jointly tackle two video-text retrieval tasks (video retrieval and video corpus moment retrieval) by leveraging the model trained only on the video retrieval task. In addition, we create an ensemble model that achieves the new state-of-the-art performance on all four datasets (TVr, How2r, YouCook2r, and VATEXr) presented in the VALUE Challenge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2022

A CLIP-Hitchhiker's Guide to Long Video Retrieval

Our goal in this paper is the adaptation of image-text models for long v...
research
11/18/2020

A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus

Identifying a short segment in a long video that semantically matches a ...
research
03/29/2023

Hierarchical Video-Moment Retrieval and Step-Captioning

There is growing interest in searching for information from large video ...
research
03/19/2021

MDMMT: Multidomain Multimodal Transformer for Video Retrieval

We present a new state-of-the-art on the text to video retrieval task on...
research
04/12/2023

TextANIMAR: Text-based 3D Animal Fine-Grained Retrieval

3D object retrieval is an important yet challenging task, which has draw...
research
02/24/2021

A Straightforward Framework For Video Retrieval Using CLIP

Video Retrieval is a challenging task where a text query is matched to a...
research
06/07/2023

An Overview of Challenges in Egocentric Text-Video Retrieval

Text-video retrieval contains various challenges, including biases comin...

Please sign up or login with your details

Forgot password? Click here to reset