Patch Is Not All You Need

08/21/2023
by   Changzhen Li, et al.
0

Vision Transformers have achieved great success in computer visions, delivering exceptional performance across various tasks. However, their inherent reliance on sequential input enforces the manual partitioning of images into patch sequences, which disrupts the image's inherent structural and semantic continuity. To handle this, we propose a novel Pattern Transformer (Patternformer) to adaptively convert images to pattern sequences for Transformer input. Specifically, we employ the Convolutional Neural Network to extract various patterns from the input image, with each channel representing a unique pattern that is fed into the succeeding Transformer as a visual token. By enabling the network to optimize these patterns, each pattern concentrates on its local region of interest, thereby preserving its intrinsic structural and semantic information. Only employing the vanilla ResNet and Transformer, we have accomplished state-of-the-art performance on CIFAR-10 and CIFAR-100, and have achieved competitive results on ImageNet.

READ FULL TEXT

page 7

page 10

page 11

page 13

page 14

page 15

page 16

research
07/29/2021

PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion

The Transformer architecture has achieved rapiddevelopment in recent yea...
research
10/11/2022

SaiT: Sparse Vision Transformers through Adaptive Token Pruning

While vision transformers have achieved impressive results, effectively ...
research
03/26/2023

Sector Patch Embedding: An Embedding Module Conforming to The Distortion Pattern of Fisheye Image

Fisheye cameras suffer from image distortion while having a large field ...
research
07/17/2023

BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization

Vision Transformer (ViT) based Vision-Language Pre-training (VLP) models...
research
03/16/2022

Towards Practical Certifiable Patch Defense with Vision Transformer

Patch attacks, one of the most threatening forms of physical attack in a...
research
04/10/2022

Stripformer: Strip Transformer for Fast Image Deblurring

Images taken in dynamic scenes may contain unwanted motion blur, which s...
research
03/08/2022

Coarse-to-Fine Vision Transformer

Vision Transformers (ViT) have made many breakthroughs in computer visio...

Please sign up or login with your details

Forgot password? Click here to reset