Tracing Knowledge in Language Models Back to the Training Data

05/23/2022
by   Ekin Akyürek, et al.
0

Neural language models (LMs) have been shown to memorize a great deal of factual knowledge. But when an LM generates an assertion, it is often difficult to determine where it learned this information and whether it is true. In this paper, we introduce a new benchmark for fact tracing: tracing language models' assertions back to the training examples that provided evidence for those predictions. Prior work has suggested that dataset-level influence methods might offer an effective framework for tracing predictions back to training data. However, such methods have not been evaluated for fact tracing, and researchers primarily have studied them through qualitative analysis or as a data cleaning technique for classification/regression tasks. We present the first experiments that evaluate influence methods for fact tracing, using well-understood information retrieval (IR) metrics. We compare two popular families of influence methods – gradient-based and embedding-based – and show that neither can fact-trace reliably; indeed, both methods fail to outperform an IR baseline (BM25) that does not even access the LM. We explore why this occurs (e.g., gradient saturation) and demonstrate that existing influence methods must be improved significantly before they can reliably attribute factual predictions in LMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2022

Enhancing Automated Software Traceability by Transfer Learning from Open-World Data

Software requirements traceability is a critical component of the softwa...
research
12/21/2022

Tracing and Removing Data Errors in Natural Language Generation Datasets

Recent work has identified noisy and misannotated data as a core cause o...
research
12/24/2021

Counterfactual Memorization in Neural Language Models

Modern neural language models widely used in tasks across NLP risk memor...
research
06/07/2020

Language Models as Fact Checkers?

Recent work has suggested that language models (LMs) store both common-s...
research
03/24/2023

TRAK: Attributing Model Behavior at Scale

The goal of data attribution is to trace model predictions back to train...
research
01/17/2023

Tracing and Manipulating Intermediate Values in Neural Math Problem Solvers

How language models process complex input that requires multiple steps o...
research
02/24/2022

First is Better Than Last for Training Data Influence

The ability to identify influential training examples enables us to debu...

Please sign up or login with your details

Forgot password? Click here to reset