Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

03/25/2023
by   Fengyin Lin, et al.
0

This paper studies the problem of zero-short sketch-based image retrieval (ZS-SBIR), however with two significant differentiators to prior art (i) we tackle all variants (inter-category, intra-category, and cross datasets) of ZS-SBIR with just one network (“everything”), and (ii) we would really like to understand how this sketch-photo matching operates (“explainable”). Our key innovation lies with the realization that such a cross-modal matching problem could be reduced to comparisons of groups of key local patches – akin to the seasoned “bag-of-words” paradigm. Just with this change, we are able to achieve both of the aforementioned goals, with the added benefit of no longer requiring external semantic knowledge. Technically, ours is a transformer-based cross-modal network, with three novel components (i) a self-attention module with a learnable tokenizer to produce visual tokens that correspond to the most informative local regions, (ii) a cross-attention module to compute local correspondences between the visual tokens across two modalities, and finally (iii) a kernel-based relation network to assemble local putative matches and produce an overall similarity metric for a sketch-photo pair. Experiments show ours indeed delivers superior performances across all ZS-SBIR settings. The all important explainable goal is elegantly achieved by visualizing cross-modal token correspondences, and for the first time, via sketch to photo synthesis by universal replacement of all matched photo patches. Code and model are available at <https://github.com/buptLinfy/ZSE-SBIR>.

READ FULL TEXT

page 1

page 6

page 7

page 8

research
05/28/2017

Cross-modal Subspace Learning for Fine-grained Sketch-based Image Retrieval

Sketch-based image retrieval (SBIR) is challenging due to the inherent d...
research
10/19/2022

Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval

Representation learning for sketch-based image retrieval has mostly been...
research
03/29/2021

StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

Sketch-based image retrieval (SBIR) is a cross-modal matching problem wh...
research
02/11/2022

WAD-CMSN: Wasserstein Distance based Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval

Zero-shot sketch-based image retrieval (ZSSBIR), as a popular studied br...
research
03/14/2023

Data-Free Sketch-Based Image Retrieval

Rising concerns about privacy and anonymity preservation of deep learnin...
research
03/28/2022

Sketch3T: Test-Time Training for Zero-Shot SBIR

Zero-shot sketch-based image retrieval typically asks for a trained mode...
research
01/11/2023

EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata

We learn a visual representation that captures information about the cam...

Please sign up or login with your details

Forgot password? Click here to reset