The geometry of hidden representations of large transformer models

02/01/2023
by   Lucrezia Valeriani, et al.
0

Large transformers are powerful architectures for self-supervised analysis of data of various nature, ranging from protein sequences to text to images. In these models, the data representation in the hidden layers live in the same space, and the semantic structure of the dataset emerges by a sequence of functionally identical transformations between one representation and the next. We here characterize the geometric and statistical properties of these representations, focusing on the evolution of such proprieties across the layers. By analyzing geometric properties such as the intrinsic dimension (ID) and the neighbor composition we find that the representations evolve in a strikingly similar manner in transformers trained on protein language tasks and image reconstruction tasks. In the first layers, the data manifold expands, becoming high-dimensional, and then it contracts significantly in the intermediate layers. In the last part of the model, the ID remains approximately constant or forms a second shallow peak. We show that the semantic complexity of the dataset emerges at the end of the first peak. This phenomenon can be observed across many models trained on diverse datasets. Based on these observations, we suggest using the ID profile as an unsupervised proxy to identify the layers which are more suitable for downstream learning tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2022

Transformer Neural Networks Attending to Both Sequence and Structure for Protein Prediction Tasks

The increasing number of protein sequences decoded from genomes is openi...
research
04/16/2023

Autoencoders with Intrinsic Dimension Constraints for Learning Low Dimensional Image Representations

Autoencoders have achieved great success in various computer vision appl...
research
05/29/2019

Intrinsic dimension of data representations in deep neural networks

Deep neural networks progressively transform their inputs across multipl...
research
04/08/2021

Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency

Transformer-based self-supervised models are trained as feature extracto...
research
10/11/2022

Intrinsic Dimension for Large-Scale Geometric Learning

The concept of dimension is essential to grasp the complexity of data. A...
research
02/16/2023

Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension

Prompting has become an important mechanism by which users can more effe...
research
06/13/2023

Is Anisotropy Inherent to Transformers?

The representation degeneration problem is a phenomenon that is widely o...

Please sign up or login with your details

Forgot password? Click here to reset