Exploring Vision Transformers for Fine-grained Classification

06/19/2021
by   Marcos V. Conde, et al.
1

Existing computer vision research in categorization struggles with fine-grained attributes recognition due to the inherently high intra-class variances and low inter-class variances. SOTA methods tackle this challenge by locating the most informative image regions and rely on them to classify the complete image. The most recent work, Vision Transformer (ViT), shows its strong performance in both traditional and fine-grained classification tasks. In this work, we propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes using the inherent multi-head self-attention mechanism. We also introduce attention-guided augmentations for improving the model's capabilities. We demonstrate the value of our approach by experimenting with four popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC7 Plant Pathology. We also prove our model's interpretability via qualitative results.

READ FULL TEXT

page 2

page 3

page 4

research
03/14/2021

TransFG: A Transformer Architecture for Fine-grained Recognition

Fine-grained visual classification (FGVC) which aims at recognizing obje...
research
08/17/2022

Conviformers: Convolutionally guided Vision Transformer

Vision transformers are nowadays the de-facto preference for image class...
research
02/09/2023

Drawing Attention to Detail: Pose Alignment through Self-Attention for Fine-Grained Object Classification

Intra-class variations in the open world lead to various challenges in c...
research
10/17/2013

Fine-grained Categorization -- Short Summary of our Entry for the ImageNet Challenge 2012

In this paper, we tackle the problem of visual categorization of dog bre...
research
09/20/2022

Fine-grained Classification of Solder Joints with α-skew Jensen-Shannon Divergence

Solder joint inspection (SJI) is a critical process in the production of...
research
03/08/2022

Coarse-to-Fine Vision Transformer

Vision Transformers (ViT) have made many breakthroughs in computer visio...
research
05/11/2023

Salient Mask-Guided Vision Transformer for Fine-Grained Classification

Fine-grained visual classification (FGVC) is a challenging computer visi...

Please sign up or login with your details

Forgot password? Click here to reset