A Frustratingly Easy Improvement for Position Embeddings via Random Padding

05/08/2023
by   Mingxu Tao, et al.
0

Position embeddings, encoding the positional relationships among tokens in text sequences, make great contributions to modeling local context features in Transformer-based pre-trained language models. However, in Extractive Question Answering, position embeddings trained with instances of varied context lengths may not perform well as we expect. Since the embeddings of rear positions are updated fewer times than the front position embeddings, the rear ones may not be properly trained. In this paper, we propose a simple but effective strategy, Random Padding, without any modifications to architectures of existing pre-trained language models. We adjust the token order of input sequences when fine-tuning, to balance the number of updating times of every position embedding. Experiments show that Random Padding can significantly improve model performance on the instances whose answers are located at rear positions, especially when models are trained on short contexts but evaluated on long contexts. Our code and data will be released for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2020

Rethinking embedding coupling in pre-trained language models

We re-evaluate the standard practice of sharing weights between input an...
research
02/28/2023

Weighted Sampling for Masked Language Modeling

Masked Language Modeling (MLM) is widely used to pretrain language model...
research
09/01/2023

BatchPrompt: Accomplish more with less

As the ever-increasing token limits of large language models (LLMs) have...
research
03/30/2022

Transformer Language Models without Positional Encodings Still Learn Positional Information

Transformers typically require some form of positional encoding, such as...
research
05/24/2023

Adapting Language Models to Compress Contexts

Transformer-based language models (LMs) are powerful and widely-applicab...
research
08/22/2023

Exploring the Effectiveness of GPT Models in Test-Taking: A Case Study of the Driver's License Knowledge Test

Large language models such as Open AI's Generative Pre-trained Transform...
research
09/08/2023

Manifold-based Verbalizer Space Re-embedding for Tuning-free Prompt-based Classification

Prompt-based classification adapts tasks to a cloze question format util...

Please sign up or login with your details

Forgot password? Click here to reset