Is Anisotropy Inherent to Transformers?

06/13/2023
by   Nathan Godey, et al.
0

The representation degeneration problem is a phenomenon that is widely observed among self-supervised learning methods based on Transformers. In NLP, it takes the form of anisotropy, a singular property of hidden representations which makes them unexpectedly close to each other in terms of angular distance (cosine-similarity). Some recent works tend to show that anisotropy is a consequence of optimizing the cross-entropy loss on long-tailed distributions of tokens. We show in this paper that anisotropy can also be observed empirically in language models with specific objectives that should not suffer directly from the same consequences. We also show that the anisotropy problem extends to Transformers trained on other modalities. Our observations tend to demonstrate that anisotropy might actually be inherent to Transformers-based models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2023

MOST: Multiple Object localization with Self-supervised Transformers for object discovery

We tackle the challenging task of unsupervised object localization in th...
research
10/19/2020

ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction

GNNs and chemical fingerprints are the predominant approaches to represe...
research
08/20/2023

Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation

Transformer and its variants are a powerful class of architectures for s...
research
06/08/2022

CASS: Cross Architectural Self-Supervision for Medical Image Analysis

Recent advances in Deep Learning and Computer Vision have alleviated man...
research
08/24/2022

Addressing Token Uniformity in Transformers via Singular Value Transformation

Token uniformity is commonly observed in transformer-based models, in wh...
research
12/05/2022

Learning Imbalanced Data with Vision Transformers

The real-world data tends to be heavily imbalanced and severely skew the...
research
02/01/2023

The geometry of hidden representations of large transformer models

Large transformers are powerful architectures for self-supervised analys...

Please sign up or login with your details

Forgot password? Click here to reset