Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification

11/23/2022
by   Nakul Sharma, et al.
0

In this paper, we study the problem of identifying logos of business brands in natural scenes in an open-set one-shot setting. This problem setup is significantly more challenging than traditionally-studied 'closed-set' and 'large-scale training samples per category' logo recognition settings. We propose a novel multi-view textual-visual encoding framework that encodes text appearing in the logos as well as the graphical design of the logos to learn robust contrastive representations. These representations are jointly learned for multiple views of logos over a batch and thereby they generalize well to unseen logos. We evaluate our proposed framework for cropped logo verification, cropped logo identification, and end-to-end logo identification in natural scene tasks; and compare it against state-of-the-art methods. Further, the literature lacks a 'very-large-scale' collection of reference logo images that can facilitate the study of one-hundred thousand-scale logo identification. To fill this gap in the literature, we introduce Wikidata Reference Logo Dataset (WiRLD), containing logos for 100K business brands harvested from Wikidata. Our proposed framework that achieves an area under the ROC curve of 91.3 QMUL-OpenLogo dataset for the verification task, outperforms state-of-the-art methods by 9.1 Toplogos-10 and the FlickrLogos32 datasets, respectively. Further, we show that our method is more stable compared to other baselines even when the number of candidate logos is on a 100K scale.

READ FULL TEXT

page 1

page 6

page 8

research
07/24/2023

Multi-View Vertebra Localization and Identification from CT Images

Accurately localizing and identifying vertebrae from CT images is crucia...
research
07/25/2019

Simultaneous multi-view instance detection with learned geometric soft-constraints

We propose to jointly learn multi-view geometry and warping between view...
research
12/16/2020

Joint Generative and Contrastive Learning for Unsupervised Person Re-identification

Annotating identity labels in large-scale datasets is a labour-intensive...
research
06/07/2020

Multi-view Contrastive Learning for Online Knowledge Distillation

Existing Online Knowledge Distillation (OKD) aims to perform collaborati...
research
04/23/2018

Guided Attention for Large Scale Scene Text Verification

Many tasks are related to determining if a particular text string exists...
research
02/06/2014

Multispectral Palmprint Encoding and Recognition

Palmprints are emerging as a new entity in multi-modal biometrics for hu...
research
06/29/2016

Learning Concept Taxonomies from Multi-modal Data

We study the problem of automatically building hypernym taxonomies from ...

Please sign up or login with your details

Forgot password? Click here to reset