Transformers and CNNs both Beat Humans on SBIR

09/14/2022
by   Omar Seddati, et al.
0

Sketch-based image retrieval (SBIR) is the task of retrieving natural images (photos) that match the semantics and the spatial configuration of hand-drawn sketch queries. The universality of sketches extends the scope of possible applications and increases the demand for efficient SBIR solutions. In this paper, we study classic triplet-based SBIR solutions and show that a persistent invariance to horizontal flip (even after model finetuning) is harming performance. To overcome this limitation, we propose several approaches and evaluate in depth each of them to check their effectiveness. Our main contributions are twofold: We propose and evaluate several intuitive modifications to build SBIR solutions with better flip equivariance. We show that vision transformers are more suited for the SBIR task, and that they outperform CNNs with a large margin. We carried out numerous experiments and introduce the first models to outperform human performance on a large-scale SBIR benchmark (Sketchy). Our best model achieves a recall of 62.25 on the sketchy benchmark compared to previous state-of-the-art methods 46.2

READ FULL TEXT
research
08/05/2022

A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch

We address the problem of retrieving images with both a sketch and a tex...
research
05/30/2023

A Recipe for Efficient SBIR Models: Combining Relative Triplet Loss with Batch Normalization and Knowledge Distillation

Sketch-Based Image Retrieval (SBIR) is a crucial task in multimedia retr...
research
03/16/2017

Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval

Free-hand sketch-based image retrieval (SBIR) is a specific cross-view r...
research
03/03/2022

Ensembles of Vision Transformers as a New Paradigm for Automated Classification in Ecology

Monitoring biodiversity is paramount to manage and protect natural resou...
research
12/11/2018

Learning Large Euclidean Margin for Sketch-based Image Retrieval

This paper addresses the problem of Sketch-Based Image Retrieval (SBIR),...
research
11/16/2016

Generalisation and Sharing in Triplet Convnets for Sketch based Visual Search

We propose and evaluate several triplet CNN architectures for measuring ...
research
04/15/2021

Learning Regional Attention over Multi-resolution Deep Convolutional Features for Trademark Retrieval

Large-scale trademark retrieval is an important content-based image retr...

Please sign up or login with your details

Forgot password? Click here to reset