Coherent Wave Dynamics and Language Generation of a Generative Pre-trained Transformer

05/08/2023
by   Tao Hong, et al.
0

Large Language Models (LLMs), such as the Generative Pretrained Transformer (GPT), have achieved tremendous success in various language tasks, but their emergent abilities have also raised many questions, concerns, and challenges that need to be addressed. To gain a better understanding of the models' inner mechanisms, we analyze the hidden state and channel wave dynamics in a small GPT, focusing on the coherence of wave patterns in terms of cross-channel correlation and individual auto-correlation. Our findings suggest that wave dynamics offer consistent and repeatable intrinsic oscillation modes, along with context-aware plasticity and expressiveness in language generation. By analyzing wave patterns, coherence, and clustering, we provide a systematic way to identify and interpret the functionality of the hidden state channels, paving the way to understand and control higher-level language pattern formation. In addition, we investigate the Poisson statistics of spelling errors in text sequence generation across various levels of model training and observe a phase-transition-like process. As coherence builds up, there is a competition between the generation of correct and misspelled words. However, once the model is adequately trained and significant coherence has emerged, the coherent process becomes strong enough to effectively suppress spelling errors, preventing the cascade amplification of defects. The distribution of correct spellings transitions from Poissonian to Sub-Poissonian, while the distribution of misspellings shows the opposite trend. By leveraging concepts and techniques from quantum physics, we gain novel insights into the dynamics of the small GPT. This approach can be extended to larger language models that exhibit more complex coherent language patterns, opening up opportunities to interpret their emergent capabilities and develop more specialized models.

READ FULL TEXT

page 7

page 9

research
04/14/2023

Stochastic Code Generation

Large language models pre-trained for code generation can generate high-...
research
10/16/2022

Model Criticism for Long-Form Text Generation

Language models have demonstrated the ability to generate highly fluent ...
research
01/12/2022

PhysNLU: A Language Resource for Evaluating Natural Language Understanding and Explanation Coherence in Physics

In order for language models to aid physics research, they must first en...
research
03/17/2022

Coherence boosting: When your pretrained language model is not paying enough attention

Long-range semantic coherence remains a challenge in automatic language ...
research
05/09/2023

Towards an Automatic Optimisation Model Generator Assisted with Generative Pre-trained Transformer

This article presents a framework for generating optimisation models usi...
research
10/15/2021

Boosting coherence of language models

Naturality of long-term information structure – coherence – remains a ch...
research
11/16/2021

Correlation in scattered perfect optical vortices

We study correlations in the speckle patterns generated by the scatterin...

Please sign up or login with your details

Forgot password? Click here to reset