Self-supervised Video Retrieval Transformer Network

04/16/2021
by   Xiangteng He, et al.
0

Content-based video retrieval aims to find videos from a large video database that are similar to or even near-duplicate of a given query video. Video representation and similarity search algorithms are crucial to any video retrieval system. To derive effective video representation, most video retrieval systems require a large amount of manually annotated data for training, making it costly inefficient. In addition, most retrieval systems are based on frame-level features for video similarity searching, making it expensive both storage wise and search wise. We propose a novel video retrieval system, termed SVRTN, that effectively addresses the above shortcomings. It first applies self-supervised training to effectively learn video representation from unlabeled data to avoid the expensive cost of manual annotation. Then, it exploits transformer structure to aggregate frame-level features into clip-level to reduce both storage space and search complexity. It can learn the complementary and discriminative information from the interactions among clip frames, as well as acquire the frame permutation and missing invariant ability to support more flexible retrieval manners. Comprehensive experiments on two challenging video retrieval datasets, namely FIVR-200K and SVD, verify the effectiveness of our proposed SVRTN method, which achieves the best performance of video retrieval on accuracy and efficiency.

READ FULL TEXT

page 3

page 4

research
11/10/2022

3D-CSL: self-supervised 3D context similarity learning for Near-Duplicate Video Retrieval

In this paper, we introduce 3D-CSL, a compact pipeline for Near-Duplicat...
research
05/18/2022

VRAG: Region Attention Graphs for Content-Based Video Retrieval

Content-based Video Retrieval (CBVR) is used on media-sharing platforms ...
research
08/04/2020

Temporal Context Aggregation for Video Retrieval with Contrastive Learning

The current research focus on Content-Based Video Retrieval requires hig...
research
03/15/2023

VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression

In content-based video retrieval (CBVR), dealing with large-scale collec...
research
10/22/2018

Where is this? Video geolocation based on neural network features

In this work we propose a method that geolocates videos within a delimit...
research
04/05/2016

Counting Grid Aggregation for Event Retrieval and Recognition

Event retrieval and recognition in a large corpus of videos necessitates...
research
12/27/2017

A Robust Zero-Watermark Scheme with Similarity-based Retrieval for Copyright Protection of 3D Video

The copyright protection of 3D videos has become a crucial issue. In thi...

Please sign up or login with your details

Forgot password? Click here to reset