Stable Anisotropic Regularization

05/30/2023
by   William Rudman, et al.
0

Given the success of Large Language Models (LLMs), there has been considerable interest in studying the properties of model activations. The literature overwhelmingly agrees that LLM representations are dominated by a few “outlier dimensions” with exceedingly high variance and magnitude. Several studies in Natural Language Processing (NLP) have sought to mitigate the impact of such outlier dimensions and force LLMs to be isotropic (i.e., have uniform variance across all dimensions in embedding space). Isotropy is thought to be a desirable property for LLMs that improves model performance and more closely aligns textual representations with human intuition. However, many of the claims regarding isotropy in NLP have been based on the average cosine similarity of embeddings, which has recently been shown to be a flawed measure of isotropy. In this paper, we propose I-STAR: IsoScore^⋆-based STable Anisotropic Regularization, a novel regularization method that can be used to increase or decrease levels of isotropy in embedding space during training. I-STAR uses IsoScore^⋆, the first accurate measure of isotropy that is both differentiable and stable on mini-batch computations. In contrast to several previous works, we find that decreasing isotropy in contextualized embeddings improves performance on the majority of tasks and models considered in this paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity

Previous work has shown that the representations output by contextual la...
research
05/23/2022

Outliers Dimensions that Disrupt Transformers Are Driven by Frequency

Transformer-based language models are known to display anisotropic behav...
research
05/29/2017

Neural Embeddings of Graphs in Hyperbolic Space

Neural embeddings have been used with great success in Natural Language ...
research
09/09/2021

All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality

Similarity measures are a vital tool for understanding how language mode...
research
10/05/2019

On Dimensional Linguistic Properties of the Word Embedding Space

Word embeddings have become a staple of several natural language process...
research
04/18/2021

Embedding-Enhanced Giza++: Improving Alignment in Low- and High- Resource Scenarios Using Embedding Space Geometry

A popular natural language processing task decades ago, word alignment h...
research
11/23/2022

Relating Regularization and Generalization through the Intrinsic Dimension of Activations

Given a pair of models with similar training set performance, it is natu...

Please sign up or login with your details

Forgot password? Click here to reset