Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval

07/16/2020
by   Christopher Thomas, et al.
0

The abundance of multimodal data (e.g. social media posts) has inspired interest in cross-modal retrieval methods. Popular approaches rely on a variety of metric learning losses, which prescribe what the proximity of image and text should be, in the learned space. However, most prior methods have focused on the case where image and text convey redundant information; in contrast, real-world image-text pairs convey complementary information with little overlap. Further, images in news articles and media portray topics in a visually diverse fashion; thus, we need to take special care to ensure a meaningful image representation. We propose novel within-modality losses which encourage semantic coherency in both the text and image subspaces, which does not necessarily align with visual coherency. Our method ensures that not only are paired images and texts close, but the expected image-image and text-text relationships are also observed. Our approach improves the results of cross-modal retrieval on four datasets compared to five baselines.

READ FULL TEXT

page 2

page 14

research
07/19/2018

Revisiting Cross Modal Retrieval

This paper proposes a cross-modal retrieval system that leverages on ima...
research
03/29/2022

On Metric Learning for Audio-Text Cross-Modal Retrieval

Audio-text retrieval aims at retrieving a target audio clip or caption f...
research
03/31/2022

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

Visual appearance is considered to be the most important cue to understa...
research
01/14/2019

Learning Shared Semantic Space with Correlation Alignment for Cross-modal Event Retrieval

In this paper, we propose to learn shared semantic space with correlatio...
research
03/28/2022

Image-text Retrieval: A Survey on Recent Research and Development

In the past few years, cross-modal image-text retrieval (ITR) has experi...
research
01/23/2019

"Is this an example image?" -- Predicting the Relative Abstractness Level of Image and Text

Successful multimodal search and retrieval requires the automatic unders...
research
06/22/2015

Modality-dependent Cross-media Retrieval

In this paper, we investigate the cross-media retrieval between images a...

Please sign up or login with your details

Forgot password? Click here to reset