Multi-manifold Attention for Vision Transformers

Vision Transformer are very popular nowadays due to their state-of-the-art performance in several computer vision tasks, such as image classification and action recognition. Although the performance of Vision Transformers have been greatly improved by employing Convolutional Neural Networks, hierarchical structures and compact forms, there is limited research on ways to utilize additional data representations to refine the attention map derived from the multi-head attention of a Transformer network. This work proposes a novel attention mechanism, called multi-manifold attention, that can substitute any standard attention mechanism in a Transformer-based network. The proposed attention models the input space in three distinct manifolds, namely Euclidean, Symmetric Positive Definite and Grassmann, with different statistical and geometrical properties, guiding the network to take into consideration a rich set of information that describe the appearance, color and texture of an image, for the computation of a highly descriptive attention map. In this way, a Vision Transformer with the proposed attention is guided to become more attentive towards discriminative features, leading to improved classification results, as shown by the experimental results on several well-known image classification datasets.

READ FULL TEXT

page 1

page 3

page 4

page 7

page 9

research
03/20/2022

Vision Transformer with Convolutions Architecture Search

Transformers exhibit great advantages in handling computer vision tasks....
research
05/30/2021

Transformer-Based Deep Image Matching for Generalizable Person Re-identification

Transformers have recently gained increasing attention in computer visio...
research
06/03/2023

Memorization Capacity of Multi-Head Attention in Transformers

In this paper, we investigate the memorization capabilities of multi-hea...
research
12/02/2021

Vision Pair Learning: An Efficient Training Framework for Image Classification

Transformer is a potentially powerful architecture for vision tasks. Alt...
research
11/28/2022

SI-GAT: A method based on improved Graph Attention Network for sonar image classification

The existing sonar image classification methods based on deep learning a...
research
12/10/2021

Couplformer:Rethinking Vision Transformer with Coupling Attention Map

With the development of the self-attention mechanism, the Transformer mo...

Please sign up or login with your details

Forgot password? Click here to reset