Locating and Editing Factual Knowledge in GPT

02/10/2022
by   Kevin Meng, et al.
8

We investigate the mechanisms underlying factual knowledge recall in autoregressive transformer language models. First, we develop a causal intervention for identifying neuron activations capable of altering a model's factual predictions. Within large GPT-style models, this reveals two distinct sets of neurons that we hypothesize correspond to knowing an abstract fact and saying a concrete word, respectively. This insight inspires the development of ROME, a novel method for editing facts stored in model weights. For evaluation, we assemble CounterFact, a dataset of over twenty thousand counterfactuals and tools to facilitate sensitive measurements of knowledge editing. Using CounterFact, we confirm the distinction between saying and knowing neurons, and we find that ROME achieves state-of-the-art performance in knowledge editing compared to other methods. An interactive demo notebook, full code implementation, and the dataset are available at https://rome.baulab.info/.

READ FULL TEXT

page 3

page 14

research
01/10/2023

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

Language models are known to learn a great quantity of factual informati...
research
07/24/2023

Evaluating the Ripple Effects of Knowledge Editing in Language Models

Modern language models capture a large body of factual knowledge. Howeve...
research
08/25/2023

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

Pre-trained language models (PLMs) contain vast amounts of factual knowl...
research
08/17/2023

PMET: Precise Model Editing in a Transformer

Model editing techniques modify a minor proportion of knowledge in Large...
research
01/24/2023

Transformer-Patcher: One Mistake worth One Neuron

Large Transformer-based Pretrained Language Models (PLMs) dominate almos...
research
08/19/2023

Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs

Large language models (LLMs) possess a wealth of knowledge encoded in th...
research
07/08/2023

Toward Interactive Dictation

Voice dictation is an increasingly important text input modality. Existi...

Please sign up or login with your details

Forgot password? Click here to reset