Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval

10/19/2022
by   Abhra Chaudhuri, et al.
3

Representation learning for sketch-based image retrieval has mostly been tackled by learning embeddings that discard modality-specific information. As instances from different modalities can often provide complementary information describing the underlying concept, we propose a cross-attention framework for Vision Transformers (XModalViT) that fuses modality-specific information instead of discarding them. Our framework first maps paired datapoints from the individual photo and sketch modalities to fused representations that unify information from both modalities. We then decouple the input space of the aforementioned modality fusion network into independent encoders of the individual modalities via contrastive and relational cross-modal knowledge distillation. Such encoders can then be applied to downstream tasks like cross-modal retrieval. We demonstrate the expressive capacity of the learned representations by performing a wide range of experiments and achieving state-of-the-art results on three fine-grained sketch-based image retrieval benchmarks: Shoe-V2, Chair-V2 and Sketchy. Implementation is available at https://github.com/abhrac/xmodal-vit.

READ FULL TEXT

page 3

page 9

page 10

page 15

page 20

page 21

page 22

page 23

research
04/28/2018

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

In this work we introduce a cross modal image retrieval system that allo...
research
03/14/2023

Data-Free Sketch-Based Image Retrieval

Rising concerns about privacy and anonymity preservation of deep learnin...
research
04/25/2022

SceneTrilogy: On Scene Sketches and its Relationship with Text and Photo

We for the first time extend multi-modal scene understanding to include ...
research
01/10/2022

Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image Representations

In tissue characterization and cancer diagnostics, multimodal imaging ha...
research
03/25/2023

Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

This paper studies the problem of zero-short sketch-based image retrieva...
research
07/09/2022

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

Content-Based Image Retrieval (CIR) aims to search for a target image by...
research
03/05/2018

Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval

In this paper we address the problem of learning robust cross-domain rep...

Please sign up or login with your details

Forgot password? Click here to reset