O-ViT: Orthogonal Vision Transformer

01/28/2022
by   Yanhong Fei, et al.
2

Inspired by the tremendous success of the self-attention mechanism in natural language processing, the Vision Transformer (ViT) creatively applies it to image patch sequences and achieves incredible performance. However, the scaled dot-product self-attention of ViT brings about scale ambiguity to the structure of the original feature space. To address this problem, we propose a novel method named Orthogonal Vision Transformer (O-ViT), to optimize ViT from the geometric perspective. O-ViT limits parameters of self-attention blocks to be on the norm-keeping orthogonal manifold, which can keep the geometry of the feature space. Moreover, O-ViT achieves both orthogonal constraints and cheap optimization overhead by adopting a surjective mapping between the orthogonal group and its Lie algebra.We have conducted comparative experiments on image recognition tasks to demonstrate O-ViT's validity and experiments show that O-ViT can boost the performance of ViT by up to 3.6

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2021

SOFT: Softmax-free Transformer with Linear Complexity

Vision transformers (ViTs) have pushed the state-of-the-art for various ...
research
09/12/2021

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

Transformers have sprung up in the field of computer vision. In this wor...
research
08/16/2017

Geometric Enclosing Networks

Training model to generate data has increasingly attracted research atte...
research
03/07/2021

Orthogonal Attention: A Cloze-Style Approach to Negation Scope Resolution

Negation Scope Resolution is an extensively researched problem, which is...
research
10/13/2022

Why self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries

In this paper, we show that structures similar to self-attention are nat...
research
11/14/2022

Towards A Unified Conformer Structure: from ASR to ASV Task

Transformer has achieved extraordinary performance in Natural Language P...
research
11/30/2022

Pattern Attention Transformer with Doughnut Kernel

We present in this paper a new architecture, the Pattern Attention Trans...

Please sign up or login with your details

Forgot password? Click here to reset