SparseSwin: Swin Transformer with Sparse Transformer Block

09/11/2023
by   Krisna Pinasthika, et al.
0

Advancements in computer vision research have put transformer architecture as the state of the art in computer vision tasks. One of the known drawbacks of the transformer architecture is the high number of parameters, this can lead to a more complex and inefficient algorithm. This paper aims to reduce the number of parameters and in turn, made the transformer more efficient. We present Sparse Transformer (SparTa) Block, a modified transformer block with an addition of a sparse token converter that reduces the number of tokens used. We use the SparTa Block inside the Swin T architecture (SparseSwin) to leverage Swin capability to downsample its input and reduce the number of initial tokens to be calculated. The proposed SparseSwin model outperforms other state of the art models in image classification with an accuracy of 86.96 85.35 its fewer parameters, the result highlights the potential of a transformer architecture using a sparse token converter with a limited number of tokens to optimize the use of the transformer and improve its performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2021

ATS: Adaptive Token Sampling For Efficient Vision Transformers

While state-of-the-art vision transformer models achieve promising resul...
research
11/25/2021

Global Interaction Modelling in Vision Transformer via Super Tokens

With the popularity of Transformer architectures in computer vision, the...
research
11/11/2022

Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

In this work we propose a novel token-based training strategy that impro...
research
04/10/2022

Stripformer: Strip Transformer for Fast Image Deblurring

Images taken in dynamic scenes may contain unwanted motion blur, which s...
research
08/09/2021

RaftMLP: Do MLP-based Models Dream of Winning Over Computer Vision?

For the past ten years, CNN has reigned supreme in the world of computer...
research
11/19/2022

TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer

In this paper, we introduce a set of effective TOken REduction (TORE) st...
research
09/15/2022

Number of Attention Heads vs Number of Transformer-Encoders in Computer Vision

Determining an appropriate number of attention heads on one hand and the...

Please sign up or login with your details

Forgot password? Click here to reset