Conviformers: Convolutionally guided Vision Transformer

08/17/2022
by   Mohit Vaishnav, et al.
0

Vision transformers are nowadays the de-facto preference for image classification tasks. There are two broad categories of classification tasks, fine-grained and coarse-grained. In fine-grained classification, the necessity is to discover subtle differences due to the high level of similarity between sub-classes. Such distinctions are often lost as we downscale the image to save the memory and computational cost associated with vision transformers (ViT). In this work, we present an in-depth analysis and describe the critical components for developing a system for the fine-grained categorization of plants from herbarium sheets. Our extensive experimental analysis indicated the need for a better augmentation technique and the ability of modern-day neural networks to handle higher dimensional images. We also introduce a convolutional transformer architecture called Conviformer which, unlike the popular Vision Transformer (ConViT), can handle higher resolution images without exploding memory and computational cost. We also introduce a novel, improved pre-processing technique called PreSizer to resize images better while preserving their original aspect ratios, which proved essential for classifying natural plants. With our simple yet effective approach, we achieved SoTA on Herbarium 202x and iNaturalist 2019 dataset.

READ FULL TEXT

page 2

page 4

research
06/19/2021

Exploring Vision Transformers for Fine-grained Classification

Existing computer vision research in categorization struggles with fine-...
research
07/03/2021

Efficient Vision Transformers via Fine-Grained Manifold Distillation

This paper studies the model compression problem of vision transformers....
research
08/28/2021

Towards Fine-grained Image Classification with Generative Adversarial Networks and Facial Landmark Detection

Fine-grained classification remains a challenging task because distingui...
research
08/23/2023

Masking Strategies for Background Bias Removal in Computer Vision Models

Models for fine-grained image classification tasks, where the difference...
research
04/09/2021

SI-Score: An image dataset for fine-grained analysis of robustness to object location, rotation and size

Before deploying machine learning models it is critical to assess their ...
research
03/04/2023

Fine-Grained ImageNet Classification in the Wild

Image classification has been one of the most popular tasks in Deep Lear...
research
03/08/2022

Coarse-to-Fine Vision Transformer

Vision Transformers (ViT) have made many breakthroughs in computer visio...

Please sign up or login with your details

Forgot password? Click here to reset