Spatial-Temporal Attention Network for Open-Set Fine-Grained Image Recognition

11/25/2022
by   Jiayin Sun, et al.
0

Triggered by the success of transformers in various visual tasks, the spatial self-attention mechanism has recently attracted more and more attention in the computer vision community. However, we empirically found that a typical vision transformer with the spatial self-attention mechanism could not learn accurate attention maps for distinguishing different categories of fine-grained images. To address this problem, motivated by the temporal attention mechanism in brains, we propose a spatial-temporal attention network for learning fine-grained feature representations, called STAN, where the features learnt by implementing a sequence of spatial self-attention operations corresponding to multiple moments are aggregated progressively. The proposed STAN consists of four modules: a self-attention backbone module for learning a sequence of features with self-attention operations, a spatial feature self-organizing module for facilitating the model training, a spatial-temporal feature learning module for aggregating the re-organized features via a Long Short-Term Memory network, and a context-aware module that is implemented as the forget block of the spatial-temporal feature learning module for preserving/forgetting the long-term memory by utilizing contextual information. Then, we propose a STAN-based method for open-set fine-grained recognition by integrating the proposed STAN network with a linear classifier, called STAN-OSFGR. Extensive experimental results on 3 fine-grained datasets and 2 coarse-grained datasets demonstrate that the proposed STAN-OSFGR outperforms 9 state-of-the-art open-set recognition methods significantly in most cases.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 11

research
07/14/2023

Complementary Frequency-Varying Awareness Network for Open-Set Fine-Grained Image Recognition

Open-set image recognition is a challenging topic in computer vision. Mo...
research
03/23/2022

Spatial self-attention network with self-attention distillation for fine-grained image recognition

The underlining task for fine-grained image recognition captures both th...
research
05/27/2021

SSAN: Separable Self-Attention Network for Video Representation Learning

Self-attention has been successfully applied to video representation lea...
research
07/06/2021

Self-Adversarial Training incorporating Forgery Attention for Image Forgery Localization

Image editing techniques enable people to modify the content of an image...
research
08/24/2023

Easy attention: A simple self-attention mechanism for Transformers

To improve the robustness of transformer neural networks used for tempor...
research
11/26/2021

TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in CNNs

Attention modules for Convolutional Neural Networks (CNNs) are an effect...
research
06/16/2020

Preserving Dynamic Attention for Long-Term Spatial-Temporal Prediction

Effective long-term predictions have been increasingly demanded in urban...

Please sign up or login with your details

Forgot password? Click here to reset