Key-Value Transformer

05/28/2023
by   Ali Borji, et al.
0

Transformers have emerged as the prevailing standard solution for various AI tasks, including computer vision and natural language processing. The widely adopted Query, Key, and Value formulation (QKV) has played a significant role in this. Nevertheless, no research has examined the essentiality of these three components for transformer performance. Therefore, we conducted an evaluation of the key-value formulation (KV), which generates symmetric attention maps, along with an asymmetric version that incorporates a 2D positional encoding into the attention matrix. Remarkably, this transformer requires fewer parameters and computation than the original one. Through experiments encompassing three task types – synthetics (such as reversing or sorting a list), vision (mnist or cifar classification), and NLP (character generation and translation) – we discovered that the KV transformer occasionally outperforms the QKV transformer. However, it also exhibits instances of underperformance compared to QKV, making it challenging to draw a definitive conclusion. Nonetheless, we consider the reported results to be encouraging and anticipate that they may pave the way for more efficient transformers in the future.

READ FULL TEXT
research
11/22/2021

DBIA: Data-free Backdoor Injection Attack against Transformer Networks

Recently, transformer architecture has demonstrated its significance in ...
research
10/02/2022

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language process...
research
02/16/2023

Role of Bias Terms in Dot-Product Attention

Dot-product attention is a core module in the present generation of neur...
research
10/16/2021

Transformer with a Mixture of Gaussian Keys

Multi-head attention is a driving force behind state-of-the-art transfor...
research
02/12/2021

Optimizing Inference Performance of Transformers on CPUs

The Transformer architecture revolutionized the field of natural languag...
research
12/21/2020

RealFormer: Transformer Likes Residual Attention

Transformer is the backbone of modern NLP models. In this paper, we prop...
research
10/22/2020

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

While the Transformer architecture has become the de-facto standard for ...

Please sign up or login with your details

Forgot password? Click here to reset