All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality

09/09/2021
by   William Timkey, et al.
0

Similarity measures are a vital tool for understanding how language models represent and process language. Standard representational similarity measures such as cosine similarity and Euclidean distance have been successfully used in static word embedding models to understand how words cluster in semantic space. Recently, these measures have been applied to embeddings from contextualized models such as BERT and GPT-2. In this work, we call into question the informativity of such measures for contextualized language models. We find that a small number of rogue dimensions, often just 1-3, dominate these measures. Moreover, we find a striking mismatch between the dimensions that dominate similarity measures and those which are important to the behavior of the model. We show that simple postprocessing techniques such as standardization are able to correct for rogue dimensions and reveal underlying representational quality. We argue that accounting for rogue dimensions is essential for any similarity-based analysis of contextual language models.

READ FULL TEXT

page 6

page 14

page 15

page 16

research
02/22/2021

Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu

Urdu is a widely spoken language in South Asia. Though immoderate litera...
research
04/12/2022

What do Toothbrushes do in the Kitchen? How Transformers Think our World is Structured

Transformer-based models are now predominant in NLP. They outperform app...
research
11/23/2021

Using Distributional Principles for the Semantic Study of Contextual Language Models

Many studies were recently done for investigating the properties of cont...
research
05/08/2023

ANALOGICAL - A New Benchmark for Analogy of Long Text for Large Language Models

Over the past decade, analogies, in the form of word-level analogies, ha...
research
10/08/2021

Text analysis and deep learning: A network approach

Much information available to applied researchers is contained within wr...
research
05/30/2023

Stable Anisotropic Regularization

Given the success of Large Language Models (LLMs), there has been consid...
research
10/19/2022

Language Models Understand Us, Poorly

Some claim language models understand us. Others won't hear it. To clari...

Please sign up or login with your details

Forgot password? Click here to reset