Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

05/16/2023
by   Boxi Cao, et al.
0

Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2023

The Effect of Masking Strategies on Knowledge Retention by Language Models

Language models retain a significant amount of world knowledge from thei...
research
06/07/2023

ModuleFormer: Learning Modular Large Language Models From Uncurated Data

Large Language Models (LLMs) have achieved remarkable results. But exist...
research
11/08/2022

SocioProbe: What, When, and Where Language Models Learn about Sociodemographics

Pre-trained language models (PLMs) have outperformed other NLP models on...
research
03/16/2022

Shepherd Pre-trained Language Models to Develop a Train of Thought: An Iterative Prompting Approach

While Pre-trained Language Models (PLMs) internalize a great amount of w...
research
04/17/2023

Effectiveness of Debiasing Techniques: An Indigenous Qualitative Analysis

An indigenous perspective on the effectiveness of debiasing techniques f...
research
06/15/2023

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

The unprecedented performance of large language models (LLMs) necessitat...
research
01/03/2023

Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Automatic music generation with artificial intelligence typically requir...

Please sign up or login with your details

Forgot password? Click here to reset