Armour: Generalizable Compact Self-Attention for Vision Transformers

08/03/2021
by   Lingchuan Meng, et al.
0

Attention-based transformer networks have demonstrated promising potential as their applications extend from natural language processing to vision. However, despite the recent improvements, such as sub-quadratic attention approximation and various training enhancements, the compact vision transformers to date using the regular attention still fall short in comparison with its convnet counterparts, in terms of accuracy, model size, and throughput. This paper introduces a compact self-attention mechanism that is fundamental and highly generalizable. The proposed method reduces redundancy and improves efficiency on top of the existing attention optimizations. We show its drop-in applicability for both the regular attention mechanism and some most recent variants in vision transformers. As a result, we produced smaller and faster models with the same or better accuracies.

READ FULL TEXT

page 3

page 5

research
08/15/2023

Attention Is Not All You Need Anymore

In recent years, the popular Transformer architecture has achieved great...
research
07/07/2022

Vision Transformers: State of the Art and Research Challenges

Transformers have achieved great success in natural language processing....
research
02/01/2022

Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers

Much of recent Deep Reinforcement Learning success is owed to the neural...
research
03/21/2023

Online Transformers with Spiking Neurons for Fast Prosthetic Hand Control

Transformers are state-of-the-art networks for most sequence processing ...
research
03/05/2021

Causal Attention for Vision-Language Tasks

We present a novel attention mechanism: Causal Attention (CATT), to remo...
research
01/22/2022

glassoformer: a query-sparse transformer for post-fault power grid voltage prediction

We propose GLassoformer, a novel and efficient transformer architecture ...
research
12/10/2021

Human Interpretation and Exploitation of Self-attention Patterns in Transformers: A Case Study in Extractive Summarization

The transformer multi-head self-attention mechanism has been thoroughly ...

Please sign up or login with your details

Forgot password? Click here to reset