Multiplicative Position-aware Transformer Models for Language Understanding

09/27/2021
by   Zhiheng Huang, et al.
6

Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on Natural Language Processing (NLP) tasks. The self-attention mechanism is position agnostic. In order to capture positional ordering information, various flavors of absolute and relative position embeddings have been proposed. However, there is no systematic analysis on their contributions and a comprehensive comparison of these methods is missing in the literature. In this paper, we review major existing position embedding methods and compare their accuracy on downstream NLP tasks, using our own implementations. We also propose a novel multiplicative embedding method which leads to superior accuracy when compared to existing methods. Finally, we show that our proposed embedding method, served as a drop-in replacement of the default absolute position embedding, can improve the RoBERTa-base and RoBERTa-large models on SQuAD1.1 and SQuAD2.0 datasets.

READ FULL TEXT
research
09/28/2020

Improve Transformer Models with Better Relative Position Embeddings

Transformer architectures rely on explicit position encodings in order t...
research
04/20/2021

RoFormer: Enhanced Transformer with Rotary Position Embedding

Position encoding in transformer architecture provides supervision for d...
research
06/23/2023

Knowledge-Infused Self Attention Transformers

Transformer-based language models have achieved impressive success in va...
research
02/11/2021

Text Compression-aided Transformer Encoding

Text encoding is one of the most important steps in Natural Language Pro...
research
06/10/2021

Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models

In this paper, we detail the relationship between convolutions and self-...
research
10/10/2020

What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding

In recent years, pre-trained Transformers have dominated the majority of...
research
07/27/2021

PiSLTRc: Position-informed Sign Language Transformer with Content-aware Convolution

Since the superiority of Transformer in learning long-term dependency, t...

Please sign up or login with your details

Forgot password? Click here to reset