Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations

04/20/2022
by   Leila Pishdad, et al.
0

Probabilistic embeddings have proven useful for capturing polysemous word meanings, as well as ambiguity in image matching. In this paper, we study the advantages of probabilistic embeddings in a cross-modal setting (i.e., text and images), and propose a simple approach that replaces the standard vector point embeddings in extant image-text matching models with probabilistic distributions that are parametrically learned. Our guiding hypothesis is that the uncertainty encoded in the probabilistic embeddings captures the cross-modal ambiguity in the input instances, and that it is through capturing this uncertainty that the probabilistic models can perform better at downstream tasks, such as image-to-text or text-to-image retrieval. Through extensive experiments on standard and new benchmarks, we show a consistent advantage for probabilistic representations in cross-modal retrieval, and validate the ability of our embeddings to capture uncertainty.

READ FULL TEXT

page 7

page 12

page 13

research
04/28/2018

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

In this work we introduce a cross modal image retrieval system that allo...
research
01/13/2021

Probabilistic Embeddings for Cross-Modal Retrieval

Cross-modal retrieval methods build a common representation space for sa...
research
05/29/2023

Improved Probabilistic Image-Text Representations

Image-Text Matching (ITM) task, a fundamental vision-language (VL) task,...
research
04/20/2023

Is Cross-modal Information Retrieval Possible without Training?

Encoded representations from a pretrained deep learning model (e.g., BER...
research
03/23/2023

Plug-and-Play Regulators for Image-Text Matching

Exploiting fine-grained correspondence and visual-semantic alignments ha...
research
07/09/2022

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

Content-Based Image Retrieval (CIR) aims to search for a target image by...
research
03/14/2019

Show, Translate and Tell

Humans have an incredible ability to process and understand information ...

Please sign up or login with your details

Forgot password? Click here to reset