Probabilistic Compositional Embeddings for Multimodal Image Retrieval

04/12/2022
by   Andrei Neculai, et al.
13

Existing works in image retrieval often consider retrieving images with one or two query inputs, which do not generalize to multiple queries. In this work, we investigate a more challenging scenario for composing multiple multimodal queries in image retrieval. Given an arbitrary number of query images and (or) texts, our goal is to retrieve target images containing the semantic concepts specified in multiple multimodal queries. To learn an informative embedding that can flexibly encode the semantics of various queries, we propose a novel multimodal probabilistic composer (MPC). Specifically, we model input images and texts as probabilistic embeddings, which can be further composed by a probabilistic composition rule to facilitate image retrieval with multiple multimodal queries. We propose a new benchmark based on the MS-COCO dataset and evaluate our model on various setups that compose multiple images and (or) text queries for multimodal image retrieval. Without bells and whistles, we show that our probabilistic model formulation significantly outperforms existing related methods on multimodal image retrieval while generalizing well to query with different amounts of inputs given in arbitrary visual and (or) textual modalities. Code is available here: https://github.com/andreineculai/MPC.

READ FULL TEXT

page 1

page 3

page 7

research
09/28/2022

Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text

Multimodal learning is a recent challenge that extends unimodal learning...
research
12/18/2018

Composing Text and Image for Image Retrieval - An Empirical Odyssey

In this paper, we study the task of image retrieval, where the input que...
research
05/23/2023

Mitigating Test-Time Bias for Fair Image Retrieval

We address the challenge of generating fair and unbiased image retrieval...
research
07/07/2020

Location Sensitive Image Retrieval and Tagging

People from different parts of the globe describe objects and concepts i...
research
09/05/2017

Cross-Media Similarity Evaluation for Web Image Retrieval in the Wild

In order to retrieve unlabeled images by textual queries, cross-media si...
research
06/23/2021

PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database

In deep learning area, large-scale image datasets bring a breakthrough i...
research
12/06/2021

Embedding Arithmetic for Text-driven Image Transformation

Latent text representations exhibit geometric regularities, such as the ...

Please sign up or login with your details

Forgot password? Click here to reset