The Effect of Masking Strategies on Knowledge Retention by Language Models

06/12/2023
by   Jonas Wallat, et al.
0

Language models retain a significant amount of world knowledge from their pre-training stage. This allows knowledgeable models to be applied to knowledge-intensive tasks prevalent in information retrieval, such as ranking or question answering. Understanding how and which factual information is acquired by our models is necessary to build responsible models. However, limited work has been done to understand the effect of pre-training tasks on the amount of knowledge captured and forgotten by language models during pre-training. Building a better understanding of knowledge acquisition is the goal of this paper. Therefore, we utilize a selection of pre-training tasks to infuse knowledge into our model. In the following steps, we test the model's knowledge retention by measuring its ability to answer factual questions. Our experiments show that masking entities and principled masking of correlated spans based on pointwise mutual information lead to more factual knowledge being retained than masking random tokens. Our findings demonstrate that, like the ability to perform a task, the (factual) knowledge acquired from being trained on that task is forgotten when a model is trained to perform another task (catastrophic forgetting) and how to prevent this phenomenon. To foster reproducibility, the code, as well as the data used in this paper, are openly available.

READ FULL TEXT
research
05/16/2023

Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

Memory is one of the most essential cognitive functions serving as a rep...
research
05/24/2023

Measuring the Knowledge Acquisition-Utilization Gap in Pretrained Language Models

While pre-trained language models (PLMs) have shown evidence of acquirin...
research
11/15/2022

Large Language Models Struggle to Learn Long-Tail Knowledge

The internet contains a wealth of knowledge – from the birthdays of hist...
research
08/28/2023

Bridging the KB-Text Gap: Leveraging Structured Knowledge-aware Pre-training for KBQA

Knowledge Base Question Answering (KBQA) aims to answer natural language...
research
08/29/2023

Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability

How do language models learn to make predictions during pre-training? To...
research
05/21/2022

An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs

Self-supervision based on the information extracted from large knowledge...
research
11/26/2022

Gender Biases Unexpectedly Fluctuate in the Pre-training Stage of Masked Language Models

Masked language models pick up gender biases during pre-training. Such b...

Please sign up or login with your details

Forgot password? Click here to reset