Deep Unsupervised Contrastive Hashing for Large-Scale Cross-Modal Text-Image Retrieval in Remote Sensing

01/20/2022
by   Georgii Mikriukov, et al.
0

Due to the availability of large-scale multi-modal data (e.g., satellite images acquired by different sensors, text sentences, etc) archives, the development of cross-modal retrieval systems that can search and retrieve semantically relevant data across different modalities based on a query in any modality has attracted great attention in RS. In this paper, we focus our attention on cross-modal text-image retrieval, where queries from one modality (e.g., text) can be matched to archive entries from another (e.g., image). Most of the existing cross-modal text-image retrieval systems require a high number of labeled training samples and also do not allow fast and memory-efficient retrieval due to their intrinsic characteristics. These issues limit the applicability of the existing cross-modal retrieval systems for large-scale applications in RS. To address this problem, in this paper we introduce a novel deep unsupervised cross-modal contrastive hashing (DUCH) method for RS text-image retrieval. The proposed DUCH is made up of two main modules: 1) feature extraction module (which extracts deep representations of the text-image modalities); and 2) hashing module (which learns to generate cross-modal binary hash codes from the extracted representations). Within the hashing module, we introduce a novel multi-objective loss function including: i) contrastive objectives that enable similarity preservation in both intra- and inter-modal similarities; ii) an adversarial objective that is enforced across two modalities for cross-modal representation consistency; iii) binarization objectives for generating representative hash codes. Experimental results show that the proposed DUCH outperforms state-of-the-art unsupervised cross-modal hashing methods on two multi-modal (image and text) benchmark archives in RS. Our code is publicly available at https://git.tu-berlin.de/rsim/duch.

READ FULL TEXT

page 1

page 4

page 11

research
04/19/2022

Unsupervised Contrastive Hashing for Cross-Modal Retrieval in Remote Sensing

The development of cross-modal retrieval systems that can search and ret...
research
02/18/2021

Hierarchical Similarity Learning for Language-based Product Image Retrieval

This paper aims for the language-based product image retrieval task. The...
research
09/14/2022

Learning to Evaluate Performance of Multi-modal Semantic Localization

Semantic localization (SeLo) refers to the task of obtaining the most re...
research
02/26/2022

An Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image-Text Correspondences in Remote Sensing

The development of accurate and scalable cross-modal image-text retrieva...
research
02/11/2019

Using Deep Cross Modal Hashing and Error Correcting Codes for Improving the Efficiency of Attribute Guided Facial Image Retrieval

With benefits of fast query speed and low storage cost, hashing-based im...
research
11/26/2017

HashGAN:Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval

As the rapid growth of multi-modal data, hashing methods for cross-modal...
research
04/09/2019

CMIR-NET : A Deep Learning Based Model For Cross-Modal Retrieval In Remote Sensing

We address the problem of cross-modal information retrieval in the domai...

Please sign up or login with your details

Forgot password? Click here to reset