Progressive Learning for Image Retrieval with Hybrid-Modality Queries

04/24/2022
by   Yida Zhao, et al.
0

Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR), is a retrieval task where the search intention is expressed in a more complex query format, involving both vision and text modalities. For example, a target product image is searched using a reference product image along with text about changing certain attributes of the reference image as the query. It is a more challenging image retrieval task that requires both semantic space learning and cross-modal fusion. Previous approaches that attempt to deal with both aspects achieve unsatisfactory performance. In this paper, we decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries. We first leverage the semantic embedding space for open-domain image-text retrieval, and then transfer the learned knowledge to the fashion-domain with fashion-related pre-training tasks. Finally, we enhance the pre-trained model from single-query to hybrid-modality query for the CTI-IR task. Furthermore, as the contribution of individual modality in the hybrid-modality query varies for different retrieval scenarios, we propose a self-supervised adaptive weighting strategy to dynamically determine the importance of image and text in the hybrid-modality query for better retrieval. Extensive experiments show that our proposed model significantly outperforms state-of-the-art methods in the mean of Recall@K by 24.9

READ FULL TEXT

page 1

page 9

research
04/28/2018

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

In this work we introduce a cross modal image retrieval system that allo...
research
07/09/2022

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

Content-Based Image Retrieval (CIR) aims to search for a target image by...
research
09/01/2022

Universal Multi-Modality Retrieval with One Unified Embedding Space

This paper presents Vision-Language Universal Search (VL-UnivSearch), wh...
research
03/06/2023

MABNet: Master Assistant Buddy Network with Hybrid Learning for Image Retrieval

Image retrieval has garnered growing interest in recent times. The curre...
research
04/07/2019

Scalable Change Retrieval Using Deep 3D Neural Codes

We present a novel scalable framework for image change detection (ICD) f...
research
03/15/2022

ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity

An intuitive way to search for images is to use queries composed of an e...
research
12/01/2018

Towards Traversing the Continuous Spectrum of Image Retrieval

Image retrieval is one of the most popular tasks in computer vision. How...

Please sign up or login with your details

Forgot password? Click here to reset