Sequence and Circle: Exploring the Relationship Between Patches

10/18/2022
by   Zhengyang Yu, et al.
0

The vision transformer (ViT) has achieved state-of-the-art results in various vision tasks. It utilizes a learnable position embedding (PE) mechanism to encode the location of each image patch. However, it is presently unclear if this learnable PE is really necessary and what its benefits are. This paper explores two alternative ways of encoding the location of individual patches that exploit prior knowledge about their spatial arrangement. One is called the sequence relationship embedding (SRE), and the other is called the circle relationship embedding (CRE). Among them, the SRE considers all patches to be in order, and adjacent patches have the same interval distance. The CRE considers the central patch as the center of the circle and measures the distance of the remaining patches from the center based on the four neighborhoods principle. Multiple concentric circles with different radii combine different patches. Finally, we implemented these two relations on three classic ViTs and tested them on four popular datasets. Experiments show that SRE and CRE can replace PE to reduce the random learnable parameters while achieving the same performance. Combining SRE or CRE with PE gets better performance than only using PE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2023

Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields

Vision transformers (ViTs) that model an image as a sequence of partitio...
research
08/30/2021

Exploring and Improving Mobile Level Vision Transformers

We study the vision transformer structure in the mobile level in this pa...
research
03/11/2022

Visualizing and Understanding Patch Interactions in Vision Transformer

Vision Transformer (ViT) has become a leading tool in various computer v...
research
06/05/2021

An End-to-End Breast Tumour Classification Model Using Context-Based Patch Modelling- A BiLSTM Approach for Image Classification

Researchers working on computational analysis of Whole Slide Images (WSI...
research
09/20/2022

Graph Reasoning Transformer for Image Parsing

Capturing the long-range dependencies has empirically proven to be effec...
research
10/29/2021

PEDENet: Image Anomaly Localization via Patch Embedding and Density Estimation

A neural network targeting at unsupervised image anomaly localization, c...

Please sign up or login with your details

Forgot password? Click here to reset