Zero-Shot Composed Image Retrieval with Textual Inversion

03/27/2023
by   Alberto Baldrati, et al.
0

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption that describes the difference between the two images. The high effort and cost required for labeling datasets for CIR hamper the widespread usage of existing methods, as they rely on supervised learning. In this work, we propose a new task, Zero-Shot CIR (ZS-CIR), that aims to address CIR without requiring a labeled training dataset. Our approach, named zero-Shot composEd imAge Retrieval with textuaL invErsion (SEARLE), maps the visual features of the reference image into a pseudo-word token in CLIP token embedding space and integrates it with the relative caption. To support research on ZS-CIR, we introduce an open-domain benchmarking dataset named Composed Image Retrieval on Common Objects in context (CIRCO), which is the first dataset for CIR containing multiple ground truths for each query. The experiments show that SEARLE exhibits better performance than the baselines on the two main datasets for CIR tasks, FashionIQ and CIRR, and on the proposed CIRCO. The dataset, the code and the model are publicly available at https://github.com/miccunifi/SEARLE .

READ FULL TEXT

page 4

page 6

page 12

page 14

page 18

research
02/06/2023

Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

In Composed Image Retrieval (CIR), a user combines a query image with te...
research
03/21/2023

CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion

This paper proposes a novel diffusion-based model, CompoDiff, for solvin...
research
08/22/2023

Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

Given a query composed of a reference image and a relative caption, the ...
research
06/12/2023

Zero-shot Composed Text-Image Retrieval

In this paper, we consider the problem of composed image retrieval (CIR)...
research
04/06/2022

UIGR: Unified Interactive Garment Retrieval

Interactive garment retrieval (IGR) aims to retrieve a target garment im...
research
05/22/2023

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

The rapidly evolving fields of e-commerce and metaverse continue to seek...
research
10/05/2022

Granularity-aware Adaptation for Image Retrieval over Multiple Tasks

Strong image search models can be learned for a specific domain, ie. set...

Please sign up or login with your details

Forgot password? Click here to reset