I Want This Product but Different : Multimodal Retrieval with Synthetic Query Expansion

02/17/2021
by   Ivona Tautkute, et al.
0

This paper addresses the problem of media retrieval using a multimodal query (a query which combines visual input with additional semantic information in natural language feedback). We propose a SynthTriplet GAN framework which resolves this task by expanding the multimodal query with a synthetically generated image that captures semantic information from both image and text input. We introduce a novel triplet mining method that uses a synthetic image as an anchor to directly optimize for embedding distances of generated and target images. We demonstrate that apart from the added value of retrieval illustration with synthetic image with the focus on customization and user feedback, the proposed method greatly surpasses other multimodal generation methods and achieves state of the art results in the multimodal retrieval task. We also show that in contrast to other retrieval methods, our method provides explainable embeddings.

READ FULL TEXT

page 1

page 5

page 6

page 7

research
10/17/2022

Bridging the Gap between Local Semantic Concepts and Bag of Visual Words for Natural Scene Image Retrieval

This paper addresses the problem of semantic-based image retrieval of na...
research
08/20/2018

Learning to Learn from Web Data through Deep Semantic Embeddings

In this paper we propose to learn a multimodal image and text embedding ...
research
12/20/2022

Generation-Augmented Query Expansion For Code Retrieval

Pre-trained language models have achieved promising success in code retr...
research
01/07/2019

Self-Supervised Learning from Web Data for Multimodal Retrieval

Self-Supervised learning from multimodal image and text data allows deep...
research
10/07/2020

VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

Text-to-image multimodal tasks, generating/retrieving an image from a gi...
research
08/29/2023

A Multimodal Visual Encoding Model Aided by Introducing Verbal Semantic Information

Biological research has revealed that the verbal semantic information in...
research
05/18/2021

A multimodal deep learning framework for scalable content based visual media retrieval

We propose a novel, efficient, modular and scalable framework for conten...

Please sign up or login with your details

Forgot password? Click here to reset