Role of Bias Terms in Dot-Product Attention

02/16/2023
by   Mahdi Namazifar, et al.
0

Dot-product attention is a core module in the present generation of neural network models, particularly transformers, and is being leveraged across numerous areas such as natural language processing and computer vision. This attention module is comprised of three linear transformations, namely query, key, and value linear transformations, each of which has a bias term. In this work, we study the role of these bias terms, and mathematically show that the bias term of the key linear transformation is redundant and could be omitted without any impact on the attention module. Moreover, we argue that the bias term of the value linear transformation has a more prominent role than that of the bias term of the query linear transformation. We empirically verify these findings through multiple experiments on language modeling, natural language understanding, and natural language generation tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2023

Key-Value Transformer

Transformers have emerged as the prevailing standard solution for variou...
research
05/19/2023

Extending Memory for Language Modelling

Breakthroughs in deep learning and memory networks have made major advan...
research
04/10/2023

Uncertainty-Aware Natural Language Inference with Stochastic Weight Averaging

This paper introduces Bayesian uncertainty modeling using Stochastic Wei...
research
08/24/2022

Identifying and Overcoming Transformation Bias in Forecasting Models

Log and square root transformations of target variable are routinely use...
research
05/28/2023

Robust Natural Language Understanding with Residual Attention Debiasing

Natural language understanding (NLU) models often suffer from unintended...
research
01/05/2022

Synthesizing Tensor Transformations for Visual Self-attention

Self-attention shows outstanding competence in capturing long-range rela...
research
12/19/2022

A Natural Bias for Language Generation Models

After just a few hundred training updates, a standard probabilistic mode...

Please sign up or login with your details

Forgot password? Click here to reset