Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval

04/18/2016
by   Xiu-Shen Wei, et al.
0

Deep convolutional neural network models pre-trained for the ImageNet classification task have been successfully adopted to tasks in other domains, such as texture description and object proposal generation, but these tasks require annotations for images in the new domain. In this paper, we focus on a novel and challenging task in the pure unsupervised setting: fine-grained image retrieval. Even with image labels, fine-grained images are difficult to classify, let alone the unsupervised retrieval task. We propose the Selective Convolutional Descriptor Aggregation (SCDA) method. SCDA firstly localizes the main object in fine-grained images, a step that discards the noisy background and keeps useful deep descriptors. The selected descriptors are then aggregated and dimensionality reduced into a short feature vector using the best practices we found. SCDA is unsupervised, using no image label or bounding box annotation. Experiments on six fine-grained datasets confirm the effectiveness of SCDA for fine-grained image retrieval. Besides, visualization of the SCDA features shows that they correspond to visual attributes (even subtle ones), which might explain SCDA's high mean average precision in fine-grained retrieval. Moreover, on general image retrieval datasets, SCDA achieves comparable retrieval results with state-of-the-art general image retrieval approaches.

READ FULL TEXT

page 2

page 5

page 6

page 7

page 8

page 10

page 12

page 13

research
06/19/2018

FineTag: Multi-label Retrieval of Attributes at Fine-grained Level in Images

In image retrieval, the features extracted from an item are used to look...
research
01/14/2020

Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features

Text contained in an image carries high-level semantics that can be expl...
research
05/23/2018

Neural Network Interpretation via Fine Grained Textual Summarization

Current visualization based network interpretation methodssuffer from la...
research
06/08/2018

DeepFirearm: Learning Discriminative Feature Representation for Fine-grained Firearm Retrieval

There are great demands for automatically regulating inappropriate appea...
research
08/29/2019

Texture Retrieval in the Wild through detection-based attributes

Capturing the essence of a textile image in a robust way is important to...
research
11/01/2022

Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality

Recent visuolinguistic pre-trained models show promising progress on var...
research
02/14/2019

MultiGrain: a unified image embedding for classes and instances

MultiGrain is a network architecture producing compact vector representa...

Please sign up or login with your details

Forgot password? Click here to reset