Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval

08/08/2017
by   Yuming Shen, et al.
0

Cross-modal hashing is usually regarded as an effective technique for large-scale textual-visual cross retrieval, where data from different modalities are mapped into a shared Hamming space for matching. Most of the traditional textual-visual binary encoding methods only consider holistic image representations and fail to model descriptive sentences. This renders existing methods inappropriate to handle the rich semantics of informative cross-modal data for quality textual-visual search tasks. To address the problem of hashing cross-modal data with semantic-rich cues, in this paper, a novel integrated deep architecture is developed to effectively encode the detailed semantics of informative images and long descriptive sentences, named as Textual-Visual Deep Binaries (TVDB). In particular, region-based convolutional networks with long short-term memory units are introduced to fully explore image regional details while semantic cues of sentences are modeled by a text convolutional network. Additionally, we propose a stochastic batch-wise training routine, where high-quality binary codes and deep encoding functions are efficiently optimized in an alternating manner. Experiments are conducted on three multimedia datasets, i.e. Microsoft COCO, IAPR TC-12, and INRIA Web Queries, where the proposed TVDB model significantly outperforms state-of-the-art binary coding methods in the task of cross-modal retrieval.

READ FULL TEXT

page 3

page 8

research
04/25/2019

Fusion-supervised Deep Cross-modal Hashing

Deep hashing has recently received attention in cross-modal retrieval fo...
research
02/22/2016

Correlation Hashing Network for Efficient Cross-Modal Retrieval

Hashing is widely applied to approximate nearest neighbor search for lar...
research
04/10/2020

Stacked Convolutional Deep Encoding Network for Video-Text Retrieval

Existing dominant approaches for cross-modal video-text retrieval task a...
research
03/01/2023

Cross-Modal Entity Matching for Visually Rich Documents

Visually rich documents (VRD) are physical/digital documents that utiliz...
research
08/23/2018

Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Cross-modal retrieval between visual data and natural language descripti...
research
02/07/2020

Deep Robust Multilevel Semantic Cross-Modal Hashing

Hashing based cross-modal retrieval has recently made significant progre...
research
08/05/2021

Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval

The current state-of-the-art image-sentence retrieval methods implicitly...

Please sign up or login with your details

Forgot password? Click here to reset