Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations

by   Leila Pishdad, et al.
York University
McGill University

Probabilistic embeddings have proven useful for capturing polysemous word meanings, as well as ambiguity in image matching. In this paper, we study the advantages of probabilistic embeddings in a cross-modal setting (i.e., text and images), and propose a simple approach that replaces the standard vector point embeddings in extant image-text matching models with probabilistic distributions that are parametrically learned. Our guiding hypothesis is that the uncertainty encoded in the probabilistic embeddings captures the cross-modal ambiguity in the input instances, and that it is through capturing this uncertainty that the probabilistic models can perform better at downstream tasks, such as image-to-text or text-to-image retrieval. Through extensive experiments on standard and new benchmarks, we show a consistent advantage for probabilistic representations in cross-modal retrieval, and validate the ability of our embeddings to capture uncertainty.


page 7

page 12

page 13


Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

In this work we introduce a cross modal image retrieval system that allo...

Probabilistic Embeddings for Cross-Modal Retrieval

Cross-modal retrieval methods build a common representation space for sa...

Improved Probabilistic Image-Text Representations

Image-Text Matching (ITM) task, a fundamental vision-language (VL) task,...

Is Cross-modal Information Retrieval Possible without Training?

Encoded representations from a pretrained deep learning model (e.g., BER...

Plug-and-Play Regulators for Image-Text Matching

Exploiting fine-grained correspondence and visual-semantic alignments ha...

Show, Translate and Tell

Humans have an incredible ability to process and understand information ...

Please sign up or login with your details

Forgot password? Click here to reset