Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

01/10/2023
by   Peter Hase, et al.
2

Language models are known to learn a great quantity of factual information during pretraining, and recent work localizes this information to specific model weights like mid-layer MLP weights (Meng et al., 2022). In this paper, we find that we can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored. This is surprising because we would expect that localizing facts to specific parameters in models would tell us where to manipulate knowledge in models, and this assumption has motivated past work on model editing methods. Specifically, we show that localization conclusions from representation denoising (also known as Causal Tracing) do not provide any insight into which model MLP layer would be best to edit in order to override an existing stored fact with a new one. This finding raises questions about how past work relies on Causal Tracing to select which model layers to edit (Meng et al., 2022). Next, to better understand the discrepancy between representation denoising and weight editing, we develop several variants of the editing problem that appear more and more like representation denoising in their design and objective. Experiments show that, for one of our editing problems, editing performance does relate to localization results from representation denoising, but we find that which layer we edit is a far better predictor of performance. Our results suggest, counterintuitively, that better mechanistic understanding of how pretrained language models work may not always translate to insights about how to best change their behavior. Code is available at: https://github.com/google/belief-localization

READ FULL TEXT

page 6

page 17

research
08/14/2023

EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models

Large Language Models (LLMs) usually suffer from knowledge cutoff or fal...
research
02/10/2022

Locating and Editing Factual Knowledge in GPT

We investigate the mechanisms underlying factual knowledge recall in aut...
research
05/22/2023

Can We Edit Factual Knowledge by In-Context Learning?

Previous studies have shown that large language models (LLMs) like GPTs ...
research
05/22/2023

Editing Large Language Models: Problems, Methods, and Opportunities

Recent advancements in deep learning have precipitated the emergence of ...
research
08/17/2023

PMET: Precise Model Editing in a Transformer

Model editing techniques modify a minor proportion of knowledge in Large...
research
11/20/2022

Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors

Large pre-trained models decay over long-term deployment as input distri...
research
08/27/2023

Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP

Mechanistic interpretability seeks to understand the neural mechanisms t...

Please sign up or login with your details

Forgot password? Click here to reset