ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity

03/15/2022
by   Ginger Delmas, et al.
0

An intuitive way to search for images is to use queries composed of an example image and a complementary text. While the first provides rich and implicit context for the search, the latter explicitly calls for new traits, or specifies how some elements of the example image should be changed to retrieve the desired target image. Current approaches typically combine the features of each of the two elements of the query into a single representation, which can then be compared to the ones of the potential target images. Our work aims at shedding new light on the task by looking at it through the prism of two familiar and related frameworks: text-to-image and image-to-image retrieval. Taking inspiration from them, we exploit the specific relation of each query element with the targeted image and derive light-weight attention mechanisms which enable to mediate between the two complementary modalities. We validate our approach on several retrieval benchmarks, querying with images and their associated free-form text modifiers. Our method obtains state-of-the-art results without resorting to side information, multi-level features, heavy pre-training nor large architectures as in previous works.

READ FULL TEXT

page 9

page 18

page 19

page 20

page 21

page 22

page 23

page 24

research
09/05/2023

Dual Relation Alignment for Composed Image Retrieval

Composed image retrieval, a task involving the search for a target image...
research
12/18/2018

Composing Text and Image for Image Retrieval - An Empirical Odyssey

In this paper, we study the task of image retrieval, where the input que...
research
04/24/2022

Progressive Learning for Image Retrieval with Hybrid-Modality Queries

Image retrieval with hybrid-modality queries, also known as composing te...
research
08/31/2023

Learning with Multi-modal Gradient Attention for Explainable Composed Image Retrieval

We consider the problem of composed image retrieval that takes an input ...
research
07/09/2022

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

Content-Based Image Retrieval (CIR) aims to search for a target image by...
research
09/04/2023

Target-Guided Composed Image Retrieval

Composed image retrieval (CIR) is a new and flexible image retrieval par...
research
11/01/2017

Query-free Clothing Retrieval via Implicit Relevance Feedback

Image-based clothing retrieval is receiving increasing interest with the...

Please sign up or login with your details

Forgot password? Click here to reset