Physics of Language Models: Part 1, Context-Free Grammar

05/23/2023
by   Zeyuan Allen-Zhu, et al.
0

We design experiments to study how generative language models, like GPT, learn context-free grammars (CFGs) – diverse language systems with a tree-like structure capturing many aspects of natural languages, programs, and human logics. CFGs are as hard as pushdown automata, and can be ambiguous so that verifying if a string satisfies the rules requires dynamic programming. We construct synthetic data and demonstrate that even for very challenging CFGs, pre-trained transformers can learn to generate sentences with near-perfect accuracy and remarkable diversity. More importantly, we delve into the physical principles behind how transformers learns CFGs. We discover that the hidden states within the transformer implicitly and precisely encode the CFG structure (such as putting tree node information exactly on the subtree boundary), and learn to form "boundary to boundary" attentions that resemble dynamic programming. We also cover some extension of CFGs as well as the robustness aspect of transformers against grammar mistakes. Overall, our research provides a comprehensive and empirical understanding of how transformers learn CFGs, and reveals the physical mechanisms utilized by transformers to capture the structure and rules of languages.

READ FULL TEXT

page 19

page 20

page 21

page 29

page 30

page 32

page 33

page 34

research
07/04/2018

Program Language Translation Using a Grammar-Driven Tree-to-Tree Model

The task of translating between programming languages differs from the c...
research
05/26/2020

Guiding Symbolic Natural Language Grammar Induction via Transformer-Based Sequence Probabilities

A novel approach to automated learning of syntactic rules governing natu...
research
05/07/2021

Are Pre-trained Convolutions Better than Pre-trained Transformers?

In the era of pre-trained language models, Transformers are the de facto...
research
11/02/2022

Characterizing Intrinsic Compositionality in Transformers with Tree Projections

When trained on language data, do transformers learn some arbitrary comp...
research
03/14/2023

Do Transformers Parse while Predicting the Masked Word?

Pre-trained language models have been shown to encode linguistic structu...
research
06/01/2023

Learning Transformer Programs

Recent research in mechanistic interpretability has attempted to reverse...
research
03/07/2023

How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding

While the successes of transformers across many domains are indisputable...

Please sign up or login with your details

Forgot password? Click here to reset