DeepAI AI Chat
Log In Sign Up

Pattern Attention Transformer with Doughnut Kernel

by   Wenyuan Sheng, et al.

We present in this paper a new architecture, the Pattern Attention Transformer (PAT), that is composed of the new doughnut kernel. Compared with tokens in the NLP field, Transformer in computer vision has the problem of handling the high resolution of pixels in images. Inheriting the patch/window idea from ViT and its follow-ups, the doughnut kernel enhances the design of patches. It replaces the line-cut boundaries with two types of areas: sensor and updating, which is based on the comprehension of self-attention (named QKVA grid). The doughnut kernel also brings a new topic about the shape of kernels. To verify its performance on image classification, PAT is designed with Transformer blocks of regular octagon shape doughnut kernels. Its performance on ImageNet 1K surpasses the Swin Transformer (+0.7 acc1).


page 1

page 2

page 3

page 4


Kernel Attention Transformer (KAT) for Histopathology Whole Slide Image Classification

Transformer has been widely used in histopathology whole slide image (WS...

Visualizing and Understanding Patch Interactions in Vision Transformer

Vision Transformer (ViT) has become a leading tool in various computer v...

Novel Convolution Kernels for Computer Vision and Shape Analysis based on Electromagnetism

Computer vision is a growing field with a lot of new applications in aut...

CabViT: Cross Attention among Blocks for Vision Transformer

Since the vision transformer (ViT) has achieved impressive performance i...

What Makes for Hierarchical Vision Transformer?

Recent studies show that hierarchical Vision Transformer with interleave...

O-ViT: Orthogonal Vision Transformer

Inspired by the tremendous success of the self-attention mechanism in na...

Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images

Very high-resolution (VHR) remote sensing (RS) image classification is t...