Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

02/06/2023
by   Kuniaki Saito, et al.
0

In Composed Image Retrieval (CIR), a user combines a query image with text to describe their intended target. Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image. Labeling such triplets is expensive and hinders broad applicability of CIR. In this work, we propose to study an important task, Zero-Shot Composed Image Retrieval (ZS-CIR), whose goal is to build a CIR model without requiring labeled triplets for training. To this end, we propose a novel method, called Pic2Word, that requires only weakly labeled image-caption pairs and unlabeled image datasets to train. Unlike existing supervised CIR models, our model trained on weakly labeled or unlabeled datasets shows strong generalization across diverse ZS-CIR tasks, e.g., attribute editing, object composition, and domain conversion. Our approach outperforms several supervised CIR methods on the common CIR benchmark, CIRR and Fashion-IQ. Code will be made publicly available at https://github.com/google-research/composed_image_retrieval.

READ FULL TEXT

page 5

page 6

page 8

page 12

research
03/27/2023

Zero-Shot Composed Image Retrieval with Textual Inversion

Composed Image Retrieval (CIR) aims to retrieve a target image based on ...
research
10/05/2022

Granularity-aware Adaptation for Image Retrieval over Multiple Tasks

Strong image search models can be learned for a specific domain, ie. set...
research
03/21/2023

CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion

This paper proposes a novel diffusion-based model, CompoDiff, for solvin...
research
06/12/2023

Zero-shot Composed Text-Image Retrieval

In this paper, we consider the problem of composed image retrieval (CIR)...
research
06/13/2023

GeneCIS: A Benchmark for General Conditional Image Similarity

We argue that there are many notions of 'similarity' and that models, li...
research
08/28/2023

CoVR: Learning Composed Video Retrieval from Web Video Captions

Composed Image Retrieval (CoIR) has recently gained popularity as a task...
research
05/17/2023

Self-Training Boosted Multi-Faceted Matching Network for Composed Image Retrieval

The composed image retrieval (CIR) task aims to retrieve the desired tar...

Please sign up or login with your details

Forgot password? Click here to reset