Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network

08/16/2017
by   Yuxin Peng, et al.
0

Nowadays, cross-modal retrieval plays an indispensable role to flexibly find information across different modalities of data. Effectively measuring the similarity between different modalities of data is the key of cross-modal retrieval. Different modalities such as image and text have imbalanced and complementary relationships, which contain unequal amount of information when describing the same semantics. For example, images often contain more details that cannot be demonstrated by textual descriptions and vice versa. Existing works based on Deep Neural Network (DNN) mostly construct one common space for different modalities to find the latent alignments between them, which lose their exclusive modality-specific characteristics. Different from the existing works, we propose modality-specific cross-modal similarity measurement (MCSM) approach by constructing independent semantic space for each modality, which adopts end-to-end framework to directly generate modality-specific cross-modal similarity without explicit common representation. For each semantic space, modality-specific characteristics within one modality are fully exploited by recurrent attention network, while the data of another modality is projected into this space with attention based joint embedding to utilize the learned attention weights for guiding the fine-grained cross-modal correlation learning, which can capture the imbalanced and complementary relationships between different modalities. Finally, the complementarity between the semantic spaces for different modalities is explored by adaptive fusion of the modality-specific cross-modal similarities to perform cross-modal retrieval. Experiments on the widely-used Wikipedia and Pascal Sentence datasets as well as our constructed large-scale XMediaNet dataset verify the effectiveness of our proposed approach, outperforming 9 state-of-the-art methods.

READ FULL TEXT

page 1

page 3

page 4

page 7

page 10

page 11

research
04/07/2017

CCL: Cross-modal Correlation Learning with Multi-grained Fusion by Hierarchical Network

Cross-modal retrieval has become a highlighted research topic for retrie...
research
06/22/2015

Modality-dependent Cross-media Retrieval

In this paper, we investigate the cross-media retrieval between images a...
research
06/25/2021

Graph Pattern Loss based Diversified Attention Network for Cross-Modal Retrieval

Cross-modal retrieval aims to enable flexible retrieval experience by co...
research
11/01/2019

Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning

This paper addresses the challenging task of video captioning which aims...
research
07/15/2022

COEM: Cross-Modal Embedding for MetaCell Identification

Metacells are disjoint and homogeneous groups of single-cell profiles, r...
research
10/11/2022

Cross-modal Search Method of Technology Video based on Adversarial Learning and Feature Fusion

Technology videos contain rich multi-modal information. In cross-modal i...

Please sign up or login with your details

Forgot password? Click here to reset