Transient Chaos in BERT

06/06/2021
by   Katsuma Inoue, et al.
0

Language is an outcome of our complex and dynamic human-interactions and the technique of natural language processing (NLP) is hence built on human linguistic activities. Bidirectional Encoder Representations from Transformers (BERT) has recently gained its popularity by establishing the state-of-the-art scores in several NLP benchmarks. A Lite BERT (ALBERT) is literally characterized as a lightweight version of BERT, in which the number of BERT parameters is reduced by repeatedly applying the same neural network called Transformer's encoder layer. By pre-training the parameters with a massive amount of natural language data, ALBERT can convert input sentences into versatile high-dimensional vectors potentially capable of solving multiple NLP tasks. In that sense, ALBERT can be regarded as a well-designed high-dimensional dynamical system whose operator is the Transformer's encoder, and essential structures of human language are thus expected to be encapsulated in its dynamics. In this study, we investigated the embedded properties of ALBERT to reveal how NLP tasks are effectively solved by exploiting its dynamics. We thereby aimed to explore the nature of human language from the dynamical expressions of the NLP model. Our short-term analysis clarified that the pre-trained model stably yields trajectories with higher dimensionality, which would enhance the expressive capacity required for NLP tasks. Also, our long-term analysis revealed that ALBERT intrinsically shows transient chaos, a typical nonlinear phenomenon showing chaotic dynamics only in its transient, and the pre-trained ALBERT model tends to produce the chaotic trajectory for a significantly longer time period compared to a randomly-initialized one. Our results imply that local chaoticity would contribute to improving NLP performance, uncovering a novel aspect in the role of chaotic dynamics in human language behaviors.

READ FULL TEXT

page 1

page 2

page 3

page 6

page 7

page 9

page 10

page 11

research
10/25/2019

HUBERT Untangles BERT to Improve Transfer across NLP Tasks

We introduce HUBERT which combines the structured-representational power...
research
09/13/2020

BoostingBERT:Integrating Multi-Class Boosting into BERT for NLP Tasks

As a pre-trained Transformer model, BERT (Bidirectional Encoder Represen...
research
06/19/2019

Pre-Training with Whole Word Masking for Chinese BERT

Bidirectional Encoder Representations from Transformers (BERT) has shown...
research
03/02/2023

Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study

Large pre-trained language models help to achieve state of the art on a ...
research
04/09/2021

Transformers: "The End of History" for NLP?

Recent advances in neural architectures, such as the Transformer, couple...
research
03/31/2023

BERTino: an Italian DistilBERT model

The recent introduction of Transformers language representation models a...
research
05/07/2021

Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality

In human-level NLP tasks, such as predicting mental health, personality,...

Please sign up or login with your details

Forgot password? Click here to reset