Counterfactual Memorization in Neural Language Models

12/24/2021
by   Chiyuan Zhang, et al.
25

Modern neural language models widely used in tasks across NLP risk memorizing sensitive information from their training data. As models continue to scale up in parameters, training data, and compute, understanding memorization in language models is both important from a learning-theoretical point of view, and is practically crucial in real world applications. An open question in previous studies of memorization in language models is how to filter out "common" memorization. In fact, most memorization criteria strongly correlate with the number of occurrences in the training set, capturing "common" memorization such as familiar phrases, public knowledge or templated texts. In this paper, we provide a principled perspective inspired by a taxonomy of human memory in Psychology. From this perspective, we formulate a notion of counterfactual memorization, which characterizes how a model's predictions change if a particular document is omitted during training. We identify and study counterfactually-memorized training examples in standard text datasets. We further estimate the influence of each training example on the validation set and on generated texts, and show that this can provide direct evidence of the source of memorization at test time.

READ FULL TEXT

page 24

page 25

page 35

page 36

page 38

page 40

page 41

page 42

research
03/15/2022

Do Language Models Plagiarize?

Past literature has illustrated that language models do not fully unders...
research
12/14/2020

Extracting Training Data from Large Language Models

It has become common to publish large (billion parameter) language model...
research
10/04/2017

Counterfactual Language Model Adaptation for Suggesting Phrases

Mobile devices use language models to suggest words and phrases for use ...
research
05/23/2022

Tracing Knowledge in Language Models Back to the Training Data

Neural language models (LMs) have been shown to memorize a great deal of...
research
10/31/2022

Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy

Studying data memorization in neural language models helps us understand...
research
02/15/2022

Quantifying Memorization Across Neural Language Models

Large language models (LMs) have been shown to memorize parts of their t...
research
08/07/2023

Studying Large Language Model Generalization with Influence Functions

When trying to gain better visibility into a machine learning model in o...

Please sign up or login with your details

Forgot password? Click here to reset