Localizing Model Behavior with Path Patching

04/12/2023
by   Nicholas Goldowsky-Dill, et al.
0

Localizing behaviors of neural networks to a subset of the network's components or a subset of interactions between components is a natural first step towards analyzing network mechanisms and possible failure modes. Existing work is often qualitative and ad-hoc, and there is no consensus on the appropriate way to evaluate localization claims. We introduce path patching, a technique for expressing and quantitatively testing a natural class of hypotheses expressing that behaviors are localized to a set of paths. We refine an explanation of induction heads, characterize a behavior of GPT-2, and open source a framework for efficiently running similar experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2020

A reactive algorithm for deducing nodal forwarding behavior in a multihop ad-hoc wireless network in the presence of errors

novel algorithm is presented to deduce individual nodal forwarding behav...
research
07/11/2017

A Cognitive Theory-based Opportunistic Resource-Pooling Scheme for Ad hoc Networks

Resource pooling in ad hoc networks deals with accumulating computing an...
research
02/01/2023

A Deep Behavior Path Matching Network for Click-Through Rate Prediction

User behaviors on an e-commerce app not only contain different kinds of ...
research
10/10/2019

Efficient Path Routing Over Road Networks in the Presence of Ad-Hoc Obstacles (Technical Report)

Nowadays, the path routing over road networks has become increasingly im...
research
06/17/2020

A Language for Autonomous Vehicles Testing Oracles

Testing autonomous vehicles (AVs) requires complex oracles to determine ...
research
06/21/2023

Python Framework for Modular and Parametric SPICE Netlists Generation

Due to the complex specifications of current electronic systems, design ...
research
08/09/2019

Optimizing Consistent Merging and Pruning of Subgraphs in Network Tomography

A communication network can be modeled as a directed connected graph wit...

Please sign up or login with your details

Forgot password? Click here to reset