Do Cross Modal Systems Leverage Semantic Relationships?

09/03/2019
by   Shah Nawaz, et al.
0

Current cross-modal retrieval systems are evaluated using R@K measure which does not leverage semantic relationships rather strictly follows the manually marked image text query pairs. Therefore, current systems do not generalize well for the unseen data in the wild. To handle this, we propose a new measure, SemanticMap, to evaluate the performance of cross-modal systems. Our proposed measure evaluates the semantic similarity between the image and text representations in the latent embedding space. We also propose a novel cross-modal retrieval system using a single stream network for bidirectional retrieval. The proposed system is based on a deep neural network trained using extended center loss, minimizing the distance of image and text descriptions in the latent space from the class centers. In our system, the text descriptions are also encoded as images which enabled us to use a single stream network for both text and images. To the best of our knowledge, our work is the first of its kind in terms of employing a single stream network for cross-modal retrieval systems. The proposed system is evaluated on two publicly available datasets including MSCOCO and Flickr30K and has shown comparable results to the current state-of-the-art methods.

READ FULL TEXT

page 4

page 6

research
07/19/2018

Revisiting Cross Modal Retrieval

This paper proposes a cross-modal retrieval system that leverages on ima...
research
10/31/2018

Semantic Modeling of Textual Relationships in Cross-Modal Retrieval

Feature modeling of different modalities is a basic problem in current r...
research
02/23/2020

Deep Multimodal Image-Text Embeddings for Automatic Cross-Media Retrieval

This paper considers the task of matching images and sentences by learni...
research
07/29/2022

Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval

This paper investigates an open research problem of generating text-imag...
research
10/10/2022

Semantically Enhanced Hard Negatives for Cross-modal Information Retrieval

Visual Semantic Embedding (VSE) aims to extract the semantics of images ...
research
09/14/2019

Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings

One of the key challenges in learning joint embeddings of multiple modal...
research
03/10/2023

Single-branch Network for Multimodal Training

With the rapid growth of social media platforms, users are sharing billi...

Please sign up or login with your details

Forgot password? Click here to reset