Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder

02/07/2018
by   Jingkuan Song, et al.
0

Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed Self-Supervised Video Hashing (SSVH), that is able to capture the temporal nature of videos in an end-to-end learning-to-hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos; and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary autoencoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world datasets (FCVID and YFCC) show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the currently best performance on the task of unsupervised video retrieval.

READ FULL TEXT

page 3

page 10

research
04/24/2019

Simultaneous Feature Aggregating and Hashing for Compact Binary Code Learning

Representing images by compact hash codes is an attractive approach for ...
research
11/20/2019

Video Segment Copy Detection Using Memory Constrained Hierarchical Batch-Normalized LSTM Autoencoder

In this report, we introduce a video hashing method for scalable video s...
research
04/14/2020

Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning

Existing unsupervised video-to-video translation methods fail to produce...
research
09/30/2020

Encode the Unseen: Predictive Video Hashing for Scalable Mid-Stream Retrieval

This paper tackles a new problem in computer vision: mid-stream video-to...
research
02/27/2020

Auto-Encoding Twin-Bottleneck Hashing

Conventional unsupervised hashing methods usually take advantage of simi...
research
12/09/2021

Self-Supervised Keypoint Discovery in Behavioral Videos

We propose a method for learning the posture and structure of agents fro...
research
11/20/2022

Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example

Audio fingerprinting systems must efficiently and robustly identify quer...

Please sign up or login with your details

Forgot password? Click here to reset