Spatial-Scale Aligned Network for Fine-Grained Recognition

by   Lizhao Gao, et al.
Megvii Technology Limited
Peking University

Existing approaches for fine-grained visual recognition focus on learning marginal region-based representations while neglecting the spatial and scale misalignments, leading to inferior performance. In this paper, we propose the spatial-scale aligned network (SSANET) and implicitly address misalignments during the recognition process. Especially, SSANET consists of 1) a self-supervised proposal mining formula with Morphological Alignment Constraints; 2) a discriminative scale mining (DSM) module, which exploits the feature pyramid via a circulant matrix, and provides the Fourier solver for fast scale alignments; 3) an oriented pooling (OP) module, that performs the pooling operation in several pre-defined orientations. Each orientation defines one kind of spatial alignment, and the network automatically determines which is the optimal alignments through learning. With the proposed two modules, our algorithm can automatically determine the accurate local proposal regions and generate more robust target representations being invariant to various appearance variances. Extensive experiments verify that SSANET is competent at learning better spatial-scale invariant target representations, yielding superior performance on the fine-grained recognition task on several benchmarks.


page 3

page 8


TOAN: Target-Oriented Alignment Network for Fine-Grained Image Categorization with Few Labeled Samples

The challenges of high intra-class variance yet low inter-class fluctuat...

Self-aligned Spatial Feature Extraction Network for UAV Vehicle Re-identification

Compared with existing vehicle re-identification (ReID) tasks conducted ...

Fine-Grained Visual Classification via Simultaneously Learning of Multi-regional Multi-grained Features

Fine-grained visual classification is a challenging task that recognizes...

Simultaneous Region Localization and Hash Coding for Fine-grained Image Retrieval

Fine-grained image hashing is a challenging problem due to the difficult...

Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment

The egocentric and exocentric viewpoints of a human activity look dramat...

Focus On Details: Online Multi-object Tracking with Diverse Fine-grained Representation

Discriminative representation is essential to keep a unique identifier f...

Leveraging Spatial Information in Radiology Reports for Ischemic Stroke Phenotyping

Classifying fine-grained ischemic stroke phenotypes relies on identifyin...

Please sign up or login with your details

Forgot password? Click here to reset