Dissecting Recall of Factual Associations in Auto-Regressive Language Models

04/28/2023
by   Mor Geva, et al.
0

Transformer-based language models (LMs) are known to capture factual knowledge in their parameters. While previous work looked into where factual associations are stored, only little is known about how they are retrieved internally during inference. We investigate this question through the lens of information flow. Given a subject-relation query, we study how the model aggregates information about the subject and relation to predict the correct attribute. With interventions on attention edges, we first identify two critical points where information propagates to the prediction: one from the relation positions followed by another from the subject positions. Next, by analyzing the information at these points, we unveil a three-step internal mechanism for attribute extraction. First, the representation at the last-subject position goes through an enrichment process, driven by the early MLP sublayers, to encode many subject-related attributes. Second, information from the relation propagates to the prediction. Third, the prediction representation "queries" the enriched subject to extract the attribute. Perhaps surprisingly, this extraction is typically done via attention heads, which often encode subject-attribute mappings in their parameters. Overall, our findings introduce a comprehensive view of how factual associations are stored and extracted internally in LMs, facilitating future research on knowledge localization and editing.

READ FULL TEXT
research
08/17/2023

Linearity of Relation Decoding in Transformer Language Models

Much of the knowledge encoded in transformer language models (LMs) may b...
research
07/24/2017

Interpreting Classifiers through Attribute Interactions in Datasets

In this work we present the novel ASTRID method for investigating which ...
research
10/14/2020

Unsupervised Relation Extraction from Language Models using Constrained Cloze Completion

We show that state-of-the-art self-supervised language models can be rea...
research
10/13/2022

Mass-Editing Memory in a Transformer

Recent work has shown exciting promise in updating large language models...
research
10/07/2022

Understanding Transformer Memorization Recall Through Idioms

To produce accurate predictions, language models (LMs) must balance betw...
research
08/04/2021

How to Query Language Models?

Large pre-trained language models (LMs) are capable of not only recoveri...
research
05/17/2023

Statistical Knowledge Assessment for Generative Language Models

Generative Language Models (GLMs) have demonstrated capabilities to stor...

Please sign up or login with your details

Forgot password? Click here to reset