Pattern Attention Transformer with Doughnut Kernel

11/30/2022
by   Wenyuan Sheng, et al.
0

We present in this paper a new architecture, the Pattern Attention Transformer (PAT), that is composed of the new doughnut kernel. Compared with tokens in the NLP field, Transformer in computer vision has the problem of handling the high resolution of pixels in images. Inheriting the patch/window idea from ViT and its follow-ups, the doughnut kernel enhances the design of patches. It replaces the line-cut boundaries with two types of areas: sensor and updating, which is based on the comprehension of self-attention (named QKVA grid). The doughnut kernel also brings a new topic about the shape of kernels. To verify its performance on image classification, PAT is designed with Transformer blocks of regular octagon shape doughnut kernels. Its performance on ImageNet 1K surpasses the Swin Transformer (+0.7 acc1).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2022

Kernel Attention Transformer (KAT) for Histopathology Whole Slide Image Classification

Transformer has been widely used in histopathology whole slide image (WS...
research
03/11/2022

Visualizing and Understanding Patch Interactions in Vision Transformer

Vision Transformer (ViT) has become a leading tool in various computer v...
research
11/14/2022

CabViT: Cross Attention among Blocks for Vision Transformer

Since the vision transformer (ViT) has achieved impressive performance i...
research
06/20/2018

Novel Convolution Kernels for Computer Vision and Shape Analysis based on Electromagnetism

Computer vision is a growing field with a lot of new applications in aut...
research
07/05/2021

What Makes for Hierarchical Vision Transformer?

Recent studies show that hierarchical Vision Transformer with interleave...
research
01/28/2022

O-ViT: Orthogonal Vision Transformer

Inspired by the tremendous success of the self-attention mechanism in na...

Please sign up or login with your details

Forgot password? Click here to reset