Discovering Variable Binding Circuitry with Desiderata

07/07/2023
by   Xander Davies, et al.
0

Recent work has shown that computation in language models may be human-understandable, with successful efforts to localize and intervene on both single-unit features and input-output circuits. Here, we introduce an approach which extends causal mediation experiments to automatically identify model components responsible for performing a specific subtask by solely specifying a set of desiderata, or causal attributes of the model components executing that subtask. As a proof of concept, we apply our method to automatically discover shared variable binding circuitry in LLaMA-13B, which retrieves variable values for multiple arithmetic tasks. Our method successfully localizes variable binding to only 9 attention heads (of the 1.6k) and one MLP in the final token's residual stream.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Understanding Arithmetic Reasoning in Language Models using Causal Mediation Analysis

Mathematical reasoning in large language models (LLMs) has garnered atte...
research
10/24/2022

Rejoinder to discussions on "Instrumental variable estimation of the causal hazard ratio"

We respond to comments on our paper, titled "Instrumental variable estim...
research
08/02/2023

Arithmetic with Language Models: from Memorization to Computation

A better understanding of the emergent computation and problem-solving c...
research
10/08/2021

Causal ImageNet: How to discover spurious features in Deep Learning?

A key reason for the lack of reliability of deep neural networks in the ...
research
01/10/2019

Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models

Variable importance is central to scientific studies, including the soci...
research
10/05/2020

The Traveling Observer Model: Multi-task Learning Through Spatial Variable Embeddings

This paper frames a general prediction system as an observer traveling a...

Please sign up or login with your details

Forgot password? Click here to reset