Improving Cross-Modal Retrieval with Set of Diverse Embeddings

11/30/2022
by   Dongwon Kim, et al.
0

Cross-modal retrieval across image and text modalities is a challenging task due to its inherent ambiguity: An image often exhibits various situations, and a caption can be coupled with diverse images. Set-based embedding has been studied as a solution to this problem. It seeks to encode a sample into a set of different embedding vectors that capture different semantics of the sample. In this paper, we present a novel set-based embedding method, which is distinct from previous work in two aspects. First, we present a new similarity function called smooth-Chamfer similarity, which is designed to alleviate the side effects of existing similarity functions for set-based embedding. Second, we propose a novel set prediction module to produce a set of embedding vectors that effectively captures diverse semantics of input by the slot attention mechanism. Our method is evaluated on the COCO and Flickr30K datasets across different visual backbones, where it outperforms existing methods including ones that demand substantially larger computation at inference.

READ FULL TEXT

page 3

page 7

page 12

page 13

research
08/02/2021

Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal Retrieval Service

It is widely acknowledged that learning joint embeddings of recipes with...
research
03/09/2020

Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism

Cross-modal food retrieval is an important task to perform analysis of f...
research
11/28/2019

Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest Neighbours Baselines to SoTA

We propose a novel non-parametric method for cross-modal retrieval which...
research
02/07/2020

Deep Robust Multilevel Semantic Cross-Modal Hashing

Hashing based cross-modal retrieval has recently made significant progre...
research
10/19/2020

DIME: An Online Tool for the Visual Comparison of Cross-Modal Retrieval Models

Cross-modal retrieval relies on accurate models to retrieve relevant res...
research
07/18/2017

VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

We present a new technique for learning visual-semantic embeddings for c...
research
05/06/2023

Keyword-Based Diverse Image Retrieval by Semantics-aware Contrastive Learning and Transformer

In addition to relevance, diversity is an important yet less studied per...

Please sign up or login with your details

Forgot password? Click here to reset