Grad-CAM guided channel-spatial attention module for fine-grained visual classification

01/24/2021
by   Shuai Xu, et al.
0

Fine-grained visual classification (FGVC) is becoming an important research field, due to its wide applications and the rapid development of computer vision technologies. The current state-of-the-art (SOTA) methods in the FGVC usually employ attention mechanisms to first capture the semantic parts and then discover their subtle differences between distinct classes. The channel-spatial attention mechanisms, which focus on the discriminative channels and regions simultaneously, have significantly improved the classification performance. However, the existing attention modules are poorly guided since part-based detectors in the FGVC depend on the network learning ability without the supervision of part annotations. As obtaining such part annotations is labor-intensive, some visual localization and explanation methods, such as gradient-weighted class activation mapping (Grad-CAM), can be utilized for supervising the attention mechanism. We propose a Grad-CAM guided channel-spatial attention module for the FGVC, which employs the Grad-CAM to supervise and constrain the attention weights by generating the coarse localization maps. To demonstrate the effectiveness of the proposed method, we conduct comprehensive experiments on three popular FGVC datasets, including CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets. The proposed method outperforms the SOTA attention modules in the FGVC task. In addition, visualizations of feature maps also demonstrate the superiority of the proposed method against the SOTA approaches.

READ FULL TEXT

page 3

page 5

research
01/31/2021

Fine-Grained Visual Classification via Simultaneously Learning of Multi-regional Multi-grained Features

Fine-grained visual classification is a challenging task that recognizes...
research
11/26/2021

TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in CNNs

Attention modules for Convolutional Neural Networks (CNNs) are an effect...
research
07/18/2023

R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut

Transformer-based models have gained popularity in the field of natural ...
research
03/09/2020

Dual-attention Guided Dropblock Module for Weakly Supervised Object Localization

In this paper, we propose a dual-attention guided dropblock module, and ...
research
12/06/2018

Guided Zoom: Questioning Network Evidence for Fine-grained Classification

We propose Guided Zoom, an approach that utilizes spatial grounding to m...
research
02/11/2020

The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification

Key for solving fine-grained image categorization is finding discriminat...
research
12/11/2018

Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification

Vehicle re-identification is an important problem and becomes desirable ...

Please sign up or login with your details

Forgot password? Click here to reset