Tracr: Compiled Transformers as a Laboratory for Interpretability

01/12/2023
by   David Lindner, et al.
0

Interpretability research aims to build tools for understanding machine learning (ML) models. However, such tools are inherently hard to evaluate because we do not have ground truth information about how ML models actually work. In this work, we propose to build transformer models manually as a testbed for interpretability research. We introduce Tracr, a "compiler" for translating human-readable programs into weights of a transformer model. Tracr takes code written in RASP, a domain-specific language (Weiss et al. 2021), and translates it into weights for a standard, decoder-only, GPT-like transformer architecture. We use Tracr to create a range of ground truth transformers that implement programs including computing token frequencies, sorting, and Dyck-n parenthesis checking, among others. To enable the broader research community to explore and use compiled models, we provide an open-source implementation of Tracr at https://github.com/deepmind/tracr.

READ FULL TEXT

page 3

page 9

page 12

page 22

page 23

page 24

research
06/01/2023

Learning Transformer Programs

Recent research in mechanistic interpretability has attempted to reverse...
research
07/23/2019

BIM: Towards Quantitative Evaluation of Interpretability Methods with Ground Truth

Interpretability is rising as an important area of research in machine l...
research
06/01/2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Modern hierarchical vision transformers have added several vision-specif...
research
08/05/2021

Finetuning Pretrained Transformers into Variational Autoencoders

Text variational autoencoders (VAEs) are notorious for posterior collaps...
research
05/23/2023

All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations

Transformer models bring propelling advances in various NLP tasks, thus ...
research
12/24/2020

QUACKIE: A NLP Classification Task With Ground Truth Explanations

NLP Interpretability aims to increase trust in model predictions. This m...
research
09/07/2023

Insights Into the Inner Workings of Transformer Models for Protein Function Prediction

Motivation: We explored how explainable AI (XAI) can help to shed light ...

Please sign up or login with your details

Forgot password? Click here to reset