DeepAI AI Chat
Log In Sign Up

Pattern Attention Transformer with Doughnut Kernel

11/30/2022
by   Wenyuan Sheng, et al.
0

We present in this paper a new architecture, the Pattern Attention Transformer (PAT), that is composed of the new doughnut kernel. Compared with tokens in the NLP field, Transformer in computer vision has the problem of handling the high resolution of pixels in images. Inheriting the patch/window idea from ViT and its follow-ups, the doughnut kernel enhances the design of patches. It replaces the line-cut boundaries with two types of areas: sensor and updating, which is based on the comprehension of self-attention (named QKVA grid). The doughnut kernel also brings a new topic about the shape of kernels. To verify its performance on image classification, PAT is designed with Transformer blocks of regular octagon shape doughnut kernels. Its performance on ImageNet 1K surpasses the Swin Transformer (+0.7 acc1).

READ FULL TEXT

page 1

page 2

page 3

page 4

06/27/2022

Kernel Attention Transformer (KAT) for Histopathology Whole Slide Image Classification

Transformer has been widely used in histopathology whole slide image (WS...
03/11/2022

Visualizing and Understanding Patch Interactions in Vision Transformer

Vision Transformer (ViT) has become a leading tool in various computer v...
06/20/2018

Novel Convolution Kernels for Computer Vision and Shape Analysis based on Electromagnetism

Computer vision is a growing field with a lot of new applications in aut...
11/14/2022

CabViT: Cross Attention among Blocks for Vision Transformer

Since the vision transformer (ViT) has achieved impressive performance i...
07/05/2021

What Makes for Hierarchical Vision Transformer?

Recent studies show that hierarchical Vision Transformer with interleave...
01/28/2022

O-ViT: Orthogonal Vision Transformer

Inspired by the tremendous success of the self-attention mechanism in na...
10/28/2022

Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images

Very high-resolution (VHR) remote sensing (RS) image classification is t...