Multi-Modal Retrieval using Graph Neural Networks

10/04/2020
by   Aashish Kumar Misraa, et al.
0

Most real world applications of image retrieval such as Adobe Stock, which is a marketplace for stock photography and illustrations, need a way for users to find images which are both visually (i.e. aesthetically) and conceptually (i.e. containing the same salient objects) as a query image. Learning visual-semantic representations from images is a well studied problem for image retrieval. Filtering based on image concepts or attributes is traditionally achieved with index-based filtering (e.g. on textual tags) or by re-ranking after an initial visual embedding based retrieval. In this paper, we learn a joint vision and concept embedding in the same high-dimensional space. This joint model gives the user fine-grained control over the semantics of the result set, allowing them to explore the catalog of images more rapidly. We model the visual and concept relationships as a graph structure, which captures the rich information through node neighborhood. This graph structure helps us learn multi-modal node embeddings using Graph Neural Networks. We also introduce a novel inference time control, based on selective neighborhood connectivity allowing the user control over the retrieval algorithm. We evaluate these multi-modal embeddings quantitatively on the downstream relevance task of image retrieval on MS-COCO dataset and qualitatively on MS-COCO and an Adobe Stock dataset.

READ FULL TEXT

page 2

page 4

page 5

page 6

research
09/21/2020

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

Scene text instances found in natural images carry explicit semantic inf...
research
05/24/2022

Recipe2Vec: Multi-modal Recipe Representation Learning with Graph Neural Networks

Learning effective recipe representations is essential in food studies. ...
research
07/18/2017

VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

We present a new technique for learning visual-semantic embeddings for c...
research
10/17/2021

Contrastive Learning of Visual-Semantic Embeddings

Contrastive learning is a powerful technique to learn representations th...
research
11/19/2020

Using Text to Teach Image Retrieval

Image retrieval relies heavily on the quality of the data modeling and t...
research
12/29/2020

Image-to-Image Retrieval by Learning Similarity between Scene Graphs

As a scene graph compactly summarizes the high-level content of an image...
research
04/04/2022

"This is my unicorn, Fluffy": Personalizing frozen vision-language representations

Large Vision Language models pretrained on web-scale data provide re...

Please sign up or login with your details

Forgot password? Click here to reset