DeepAI AI Chat
Log In Sign Up

Exploring Vision Transformers for Fine-grained Classification

06/19/2021
by   Marcos V. Conde, et al.
1

Existing computer vision research in categorization struggles with fine-grained attributes recognition due to the inherently high intra-class variances and low inter-class variances. SOTA methods tackle this challenge by locating the most informative image regions and rely on them to classify the complete image. The most recent work, Vision Transformer (ViT), shows its strong performance in both traditional and fine-grained classification tasks. In this work, we propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes using the inherent multi-head self-attention mechanism. We also introduce attention-guided augmentations for improving the model's capabilities. We demonstrate the value of our approach by experimenting with four popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC7 Plant Pathology. We also prove our model's interpretability via qualitative results.

READ FULL TEXT

page 2

page 3

page 4

03/14/2021

TransFG: A Transformer Architecture for Fine-grained Recognition

Fine-grained visual classification (FGVC) which aims at recognizing obje...
08/17/2022

Conviformers: Convolutionally guided Vision Transformer

Vision transformers are nowadays the de-facto preference for image class...
02/09/2023

Drawing Attention to Detail: Pose Alignment through Self-Attention for Fine-Grained Object Classification

Intra-class variations in the open world lead to various challenges in c...
10/17/2013

Fine-grained Categorization -- Short Summary of our Entry for the ImageNet Challenge 2012

In this paper, we tackle the problem of visual categorization of dog bre...
09/20/2022

Fine-grained Classification of Solder Joints with α-skew Jensen-Shannon Divergence

Solder joint inspection (SJI) is a critical process in the production of...
03/08/2022

Coarse-to-Fine Vision Transformer

Vision Transformers (ViT) have made many breakthroughs in computer visio...
05/17/2021

A Fine-Grained Visual Attention Approach for Fingerspelling Recognition in the Wild

Fingerspelling in sign language has been the means of communicating tech...