Weighted Sampling for Masked Language Modeling

02/28/2023
by   Linhan Zhang, et al.
0

Masked Language Modeling (MLM) is widely used to pretrain language models. The standard random masking strategy in MLM causes the pre-trained language models (PLMs) to be biased toward high-frequency tokens. Representation learning of rare tokens is poor and PLMs have limited performance on downstream tasks. To alleviate this frequency bias issue, we propose two simple and effective Weighted Sampling strategies for masking tokens based on the token frequency and training loss. We apply these two strategies to BERT and obtain Weighted-Sampled BERT (WSBERT). Experiments on the Semantic Textual Similarity benchmark (STS) show that WSBERT significantly improves sentence embeddings over BERT. Combining WSBERT with calibration methods and prompt learning further improves sentence embeddings. We also investigate fine-tuning WSBERT on the GLUE benchmark and show that Weighted Sampling also improves the transfer learning capability of the backbone PLM. We further analyze and provide insights into how WSBERT improves token embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2022

Token Dropping for Efficient BERT Pretraining

Transformer-based models generally allocate the same amount of computati...
research
05/24/2023

Revisiting Token Dropping Strategy in Efficient BERT Pretraining

Token dropping is a recently-proposed strategy to speed up the pretraini...
research
05/08/2023

A Frustratingly Easy Improvement for Position Embeddings via Random Padding

Position embeddings, encoding the positional relationships among tokens ...
research
06/02/2020

Position Masking for Language Models

Masked language modeling (MLM) pre-training models such as BERT corrupt ...
research
03/16/2022

AdapLeR: Speeding up Inference by Adaptive Length Reduction

Pre-trained language models have shown stellar performance in various do...
research
10/05/2020

PMI-Masking: Principled masking of correlated spans

Masking tokens uniformly at random constitutes a common flaw in the pret...
research
08/09/2019

The role of cue enhancement and frequency fine-tuning in hearing impaired phone recognition

A speech-based hearing test is designed to identify the susceptible erro...

Please sign up or login with your details

Forgot password? Click here to reset