CLIP Multi-modal Hashing: A new baseline CLIPMH

08/22/2023
by   Jian Zhu, et al.
0

The multi-modal hashing method is widely used in multimedia retrieval. It can fuse multi-source data to generate binary hash code. However, the current multi-modal methods have the problem of low retrieval accuracy. The reason is that the individual backbone networks have limited feature expression capabilities and are not jointly pre-trained on large-scale unsupervised multi-modal data. To solve this problem, we propose a new baseline CLIP Multi-modal Hashing (CLIPMH) method. It uses CLIP model to extract text and image features, and then fuse to generate hash code. CLIP improves the expressiveness of each modal feature. In this way, it can greatly improve the retrieval performance of multi-modal hashing methods. In comparison to state-of-the-art unsupervised and supervised multi-modal hashing methods, experiments reveal that the proposed CLIPMH can significantly enhance performance (Maximum increase of 8.38 the text and visual backbone networks commonly used before.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2020

Adaptive Online Multi-modal Hashing via Hadamard Matrix

Hashing plays an important role in information retrieval, due to its low...
research
09/09/2021

Online Enhanced Semantic Hashing: Towards Effective and Efficient Retrieval for Streaming Multi-Modal Data

With the vigorous development of multimedia equipment and applications, ...
research
11/26/2017

HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval

As the rapid growth of multi-modal data, hashing methods for cross-modal...
research
09/15/2022

Towards Healing the Blindness of Score Matching

Score-based divergences have been widely used in machine learning and st...
research
09/09/2020

Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition

Collecting and accessing a large amount of medical data is very time-con...
research
10/26/2018

Investigating non-classical correlations between decision fused multi-modal documents

Correlation has been widely used to facilitate various information retri...
research
07/13/2017

Discrete Multi-modal Hashing with Canonical Views for Robust Mobile Landmark Search

Mobile landmark search (MLS) recently receives increasing attention for ...

Please sign up or login with your details

Forgot password? Click here to reset