2-D SSM: A General Spatial Layer for Visual Transformers

06/11/2023
by   Ethan Baron, et al.
0

A central objective in computer vision is to design models with appropriate 2-D inductive bias. Desiderata for 2D inductive bias include two-dimensional position awareness, dynamic spatial locality, and translation and permutation invariance. To address these goals, we leverage an expressive variation of the multidimensional State Space Model (SSM). Our approach introduces efficient parameterization, accelerated computation, and a suitable normalization scheme. Empirically, we observe that incorporating our layer at the beginning of each transformer block of Vision Transformers (ViT) significantly enhances performance for multiple ViT backbones and across datasets. The new layer is effective even with a negligible amount of additional parameters and inference time. Ablation studies and visualizations demonstrate that the layer has a strong 2-D inductive bias. For example, vision transformers equipped with our layer exhibit effective performance even without positional encoding

READ FULL TEXT
research
07/05/2021

Vision Xformers: Efficient Attention for Image Classification

Although transformers have become the neural architectures of choice for...
research
06/23/2021

Co-advise: Cross Inductive Bias Distillation

Transformers recently are adapted from the community of natural language...
research
10/31/2022

Studying inductive biases in image classification task

Recently, self-attention (SA) structures became popular in computer visi...
research
08/18/2022

The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs

We present a simple baseline for directly estimating the relative pose (...
research
09/19/2023

Interpret Vision Transformers as ConvNets with Dynamic Convolutions

There has been a debate about the superiority between vision Transformer...
research
12/13/2022

What do Vision Transformers Learn? A Visual Exploration

Vision transformers (ViTs) are quickly becoming the de-facto architectur...
research
01/27/2023

Robust Transformer with Locality Inductive Bias and Feature Normalization

Vision transformers have been demonstrated to yield state-of-the-art res...

Please sign up or login with your details

Forgot password? Click here to reset