Performance Evaluation of Swin Vision Transformer Model using Gradient Accumulation Optimization Technique

07/31/2023
by   Sanad Aburass, et al.
0

Vision Transformers (ViTs) have emerged as a promising approach for visual recognition tasks, revolutionizing the field by leveraging the power of transformer-based architectures. Among the various ViT models, Swin Transformers have gained considerable attention due to their hierarchical design and ability to capture both local and global visual features effectively. This paper evaluates the performance of Swin ViT model using gradient accumulation optimization (GAO) technique. We investigate the impact of gradient accumulation optimization technique on the model's accuracy and training time. Our experiments show that applying the GAO technique leads to a significant decrease in the accuracy of the Swin ViT model, compared to the standard Swin Transformer model. Moreover, we detect a significant increase in the training time of the Swin ViT model when GAO model is applied. These findings suggest that applying the GAO technique may not be suitable for the Swin ViT model, and concern should be undertaken when using GAO technique for other transformer-based models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2021

CMT: Convolutional Neural Networks Meet Vision Transformers

Vision transformers have been successfully applied to image recognition ...
research
06/07/2023

Optimizing ViViT Training: Time and Memory Reduction for Action Recognition

In this paper, we address the challenges posed by the substantial traini...
research
10/23/2022

Transformers For Recognition In Overhead Imagery: A Reality Check

There is evidence that transformers offer state-of-the-art recognition p...
research
10/06/2022

Gastrointestinal Disorder Detection with a Transformer Based Approach

Accurate disease categorization using endoscopic images is a significant...
research
04/14/2023

Masked Pre-Training of Transformers for Histology Image Analysis

In digital pathology, whole slide images (WSIs) are widely used for appl...
research
01/30/2022

Aggregating Global Features into Local Vision Transformer

Local Transformer-based classification models have recently achieved pro...
research
05/19/2021

Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?

The automatic detection of humor poses a grand challenge for natural lan...

Please sign up or login with your details

Forgot password? Click here to reset