Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization

by   Yiyang Chen, et al.

We investigate composed image retrieval with text feedback. Users gradually look for the target of interest by moving from coarse to fine-grained feedback. However, existing methods merely focus on the latter, i.e, fine-grained search, by harnessing positive and negative pairs during training. This pair-based paradigm only considers the one-to-one distance between a pair of specific points, which is not aligned with the one-to-many coarse-grained retrieval process and compromises the recall rate. In an attempt to fill this gap, we introduce a unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval by considering the multi-grained uncertainty. The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively. Specifically, our method contains two modules: uncertainty modeling and uncertainty regularization. (1) The uncertainty modeling simulates the multi-grained queries by introducing identically distributed fluctuations in the feature space. (2) Based on the uncertainty modeling, we further introduce uncertainty regularization to adapt the matching objective according to the fluctuation range. Compared with existing methods, the proposed strategy explicitly prevents the model from pushing away potential candidates in the early stage, and thus improves the recall rate. On the three public datasets, i.e., FashionIQ, Fashion200k, and Shoes, the proposed method has achieved +4.03 baseline, respectively.


page 4

page 8


Fashion Image Retrieval with Multi-Granular Alignment

Fashion image retrieval task aims to search relevant clothing items of a...

Ranking-aware Uncertainty for Text-guided Image Retrieval

Text-guided image retrieval is to incorporate conditional text to better...

One-Shot Fine-Grained Instance Retrieval

Fine-Grained Visual Categorization (FGVC) has achieved significant progr...

Video-Text Retrieval by Supervised Multi-Space Multi-Grained Alignment

While recent progress in video-text retrieval has been advanced by the e...

A New Benchmark and Approach for Fine-grained Cross-media Retrieval

Cross-media retrieval is to return the results of various media types co...

Self-Training Boosted Multi-Faceted Matching Network for Composed Image Retrieval

The composed image retrieval (CIR) task aims to retrieve the desired tar...

Refining Coarse-grained Spatial Data using Auxiliary Spatial Data Sets with Various Granularities

We propose a probabilistic model for refining coarse-grained spatial dat...

Please sign up or login with your details

Forgot password? Click here to reset