A CLIP-Hitchhiker's Guide to Long Video Retrieval

05/17/2022
by   Max Bain, et al.
13

Our goal in this paper is the adaptation of image-text models for long video retrieval. Recent works have demonstrated state-of-the-art performance in video retrieval by adopting CLIP, effectively hitchhiking on the image-text representation for video tasks. However, there has been limited success in learning temporal aggregation that outperform mean-pooling the image-level representations extracted per frame by CLIP. We find that the simple yet effective baseline of weighted-mean of frame embeddings via query-scoring is a significant improvement above all prior temporal modelling attempts and mean-pooling. In doing so, we provide an improved baseline for others to compare to and demonstrate state-of-the-art performance of this simple baseline on a suite of long video retrieval benchmarks.

READ FULL TEXT

Authors

page 14

10/11/2021

ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation

Video-text retrieval has many real-world applications such as media anal...
11/09/2020

Learning the Best Pooling Strategy for Visual Semantic Embedding

Visual Semantic Embedding (VSE) is a dominant approach for vision-langua...
08/09/2016

Mean Box Pooling: A Rich Image Representation and Output Embedding for the Visual Madlibs Task

We present Mean Box Pooling, a novel visual representation that pools ov...
11/24/2021

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

A great challenge in video-language (VidL) modeling lies in the disconne...
11/14/2014

A Discriminative CNN Video Representation for Event Detection

In this paper, we propose a discriminative video representation for even...
10/24/2011

Quels formalismes temporels pour représenter des connaissances extraites de textes de recettes de cuisine ?

The Taaable projet goal is to create a case-based reasoning system for r...
02/24/2021

A Straightforward Framework For Video Retrieval Using CLIP

Video Retrieval is a challenging task where a text query is matched to a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.