LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models

04/26/2022
by   Mor Geva, et al.
5

The opaque nature and unexplained behavior of transformer-based language models (LMs) have spurred a wide interest in interpreting their predictions. However, current interpretation methods mostly focus on probing models from outside, executing behavioral tests, and analyzing salience input features, while the internal prediction construction process is largely not understood. In this work, we introduce LM-Debugger, an interactive debugger tool for transformer-based LMs, which provides a fine-grained interpretation of the model's internal prediction process, as well as a powerful framework for intervening in LM behavior. For its backbone, LM-Debugger relies on a recent method that interprets the inner token representations and their updates by the feed-forward layers in the vocabulary space. We demonstrate the utility of LM-Debugger for single-prediction debugging, by inspecting the internal disambiguation process done by GPT2. Moreover, we show how easily LM-Debugger allows to shift model behavior in a direction of the user's choice, by identifying a few vectors in the network and inducing effective interventions to the prediction process. We release LM-Debugger as an open-source tool and a demo over GPT2 models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2022

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

Transformer-based language models (LMs) are at the core of modern NLP, b...
research
10/07/2022

Understanding Transformer Memorization Recall Through Idioms

To produce accurate predictions, language models (LMs) must balance betw...
research
02/24/2023

Analyzing And Editing Inner Mechanisms Of Backdoored Language Models

Recent advancements in interpretability research made transformer langua...
research
04/04/2019

Visualizing Attention in Transformer-Based Language models

We present an open-source tool for visualizing multi-head self-attention...
research
04/04/2019

Visualizing Attention in Transformer-Based Language Representation Models

We present an open-source tool for visualizing multi-head self-attention...
research
05/24/2022

Garden-Path Traversal within GPT-2

In recent years, massive language models consisting exclusively of trans...
research
05/22/2023

Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT

Recent advances in interpretability suggest we can project weights and h...

Please sign up or login with your details

Forgot password? Click here to reset