Graph Pattern Loss based Diversified Attention Network for Cross-Modal Retrieval

06/25/2021
by   Xueying Chen, et al.
0

Cross-modal retrieval aims to enable flexible retrieval experience by combining multimedia data such as image, video, text, and audio. One core of unsupervised approaches is to dig the correlations among different object representations to complete satisfied retrieval performance without requiring expensive labels. In this paper, we propose a Graph Pattern Loss based Diversified Attention Network(GPLDAN) for unsupervised cross-modal retrieval to deeply analyze correlations among representations. First, we propose a diversified attention feature projector by considering the interaction between different representations to generate multiple representations of an instance. Then, we design a novel graph pattern loss to explore the correlations among different representations, in this graph all possible distances between different representations are considered. In addition, a modality classifier is added to explicitly declare the corresponding modalities of features before fusion and guide the network to enhance discrimination ability. We test GPLDAN on four public datasets. Compared with the state-of-the-art cross-modal retrieval methods, the experimental results demonstrate the performance and competitiveness of GPLDAN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2021

Learning Explicit and Implicit Latent Common Spaces for Audio-Visual Cross-Modal Retrieval

Learning common subspace is prevalent way in cross-modal retrieval to so...
research
08/16/2017

Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network

Nowadays, cross-modal retrieval plays an indispensable role to flexibly ...
research
08/08/2020

Cross-modal Center Loss

Cross-modal retrieval aims to learn discriminative and modal-invariant f...
research
11/02/2022

CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation

Radiology report generation (RRG) has gained increasing research attenti...
research
09/19/2022

Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising

The advancement of the communication technology and the popularity of th...
research
04/06/2023

Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

Cross-modal retrieval methods are the preferred tool to search databases...

Please sign up or login with your details

Forgot password? Click here to reset