Self-Training Boosted Multi-Faceted Matching Network for Composed Image Retrieval

05/17/2023
by   Haokun Wen, et al.
0

The composed image retrieval (CIR) task aims to retrieve the desired target image for a given multimodal query, i.e., a reference image with its corresponding modification text. The key limitations encountered by existing efforts are two aspects: 1) ignoring the multi-faceted query-target matching factors; 2) ignoring the potential unlabeled reference-target image pairs in existing benchmark datasets. To address these two limitations is non-trivial due to the following challenges: 1) how to effectively model the multi-faceted matching factors in a latent way without direct supervision signals; 2) how to fully utilize the potential unlabeled reference-target image pairs to improve the generalization ability of the CIR model. To address these challenges, in this work, we first propose a muLtI-faceted Matching Network (LIMN), which consists of three key modules: multi-grained image/text encoder, latent factor-oriented feature aggregation, and query-target matching modeling. Thereafter, we design an iterative dual self-training paradigm to further enhance the performance of LIMN by fully utilizing the potential unlabeled reference-target image pairs in a semi-supervised manner. Specifically, we denote the iterative dual self-training paradigm enhanced LIMN as LIMN+. Extensive experiments on three real-world datasets, FashionIQ, Shoes, and Birds-to-Words, show that our proposed method significantly surpasses the state-of-the-art baselines.

READ FULL TEXT

page 1

page 6

page 10

page 11

research
09/04/2023

Target-Guided Composed Image Retrieval

Composed image retrieval (CIR) is a new and flexible image retrieval par...
research
03/29/2023

Bi-directional Training for Composed Image Retrieval via Text Prompt Learning

Composed image retrieval searches for a target image based on a multi-mo...
research
09/05/2023

Dual Relation Alignment for Composed Image Retrieval

Composed image retrieval, a task involving the search for a target image...
research
02/06/2023

Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

In Composed Image Retrieval (CIR), a user combines a query image with te...
research
11/14/2022

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization

We investigate composed image retrieval with text feedback. Users gradua...
research
05/25/2023

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

Composed image retrieval aims to find an image that best matches a given...
research
04/07/2021

RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network

In this paper, we study the compositional learning of images and texts f...

Please sign up or login with your details

Forgot password? Click here to reset