Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

03/28/2022
by   Mor Geva, et al.
0

Transformer-based language models (LMs) are at the core of modern NLP, but their internal prediction construction process is opaque and largely not understood. In this work, we make a substantial step towards unveiling this underlying prediction process, by reverse-engineering the operation of the feed-forward network (FFN) layers, one of the building blocks of transformer models. We view the token representation as a changing distribution over the vocabulary, and the output from each FFN layer as an additive update to that distribution. Then, we analyze the FFN updates in the vocabulary space, showing that each update can be decomposed to sub-updates corresponding to single FFN parameter vectors, each promoting concepts that are often human-interpretable. We then leverage these findings for controlling LM predictions, where we reduce the toxicity of GPT2 by almost 50 with a simple early exit rule, saving 20

READ FULL TEXT
research
04/26/2022

LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models

The opaque nature and unexplained behavior of transformer-based language...
research
12/29/2020

Transformer Feed-Forward Layers Are Key-Value Memories

Feed-forward layers constitute two-thirds of a transformer model's param...
research
02/01/2023

Feed-Forward Blocks Control Contextualization in Masked Language Models

Understanding the inner workings of neural network models is a crucial s...
research
10/07/2022

Understanding Transformer Memorization Recall Through Idioms

To produce accurate predictions, language models (LMs) must balance betw...
research
05/29/2023

Brainformers: Trading Simplicity for Efficiency

Transformers are central to recent successes in natural language process...
research
03/16/2023

Jump to Conclusions: Short-Cutting Transformers With Linear Transformations

Transformer-based language models (LMs) create hidden representations of...
research
08/09/2021

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Neural painting refers to the procedure of producing a series of strokes...

Please sign up or login with your details

Forgot password? Click here to reset