Interpret Vision Transformers as ConvNets with Dynamic Convolutions

09/19/2023
by   Chong Zhou, et al.
0

There has been a debate about the superiority between vision Transformers and ConvNets, serving as the backbone of computer vision models. Although they are usually considered as two completely different architectures, in this paper, we interpret vision Transformers as ConvNets with dynamic convolutions, which enables us to characterize existing Transformers and dynamic ConvNets in a unified framework and compare their design choices side by side. In addition, our interpretation can also guide the network design as researchers now can consider vision Transformers from the design space of ConvNets and vice versa. We demonstrate such potential through two specific studies. First, we inspect the role of softmax in vision Transformers as the activation function and find it can be replaced by commonly used ConvNets modules, such as ReLU and Layer Normalization, which results in a faster convergence rate and better performance. Second, following the design of depth-wise convolution, we create a corresponding depth-wise vision Transformer that is more efficient with comparable performance. The potential of the proposed unified interpretation is not limited to the given examples and we hope it can inspire the community and give rise to more advanced network architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2023

Replacing softmax with ReLU in Vision Transformers

Previous research observed accuracy degradation when replacing the atten...
research
10/12/2022

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

This paper studies the curious phenomenon for machine learning models wi...
research
02/16/2023

Efficient 3D Object Reconstruction using Visual Transformers

Reconstructing a 3D object from a 2D image is a well-researched vision p...
research
06/21/2023

EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations

Equivariant Transformers such as Equiformer have demonstrated the effica...
research
08/08/2022

Efficient Neural Net Approaches in Metal Casting Defect Detection

One of the most pressing challenges prevalent in the steel manufacturing...
research
06/11/2023

2-D SSM: A General Spatial Layer for Visual Transformers

A central objective in computer vision is to design models with appropri...
research
07/19/2022

Formal Algorithms for Transformers

This document aims to be a self-contained, mathematically precise overvi...

Please sign up or login with your details

Forgot password? Click here to reset