How to Dissect a Muppet: The Structure of Transformer Embedding Spaces

06/07/2022
by   Timothee Mickus, et al.
0

Pretrained embeddings based on the Transformer architecture have taken the NLP community by storm. We show that they can mathematically be reframed as a sum of vector factors and showcase how to use this reframing to study the impact of each component. We provide evidence that multi-head attentions and feed-forwards are not equally useful in all downstream applications, as well as a quantitative overview of the effects of finetuning on the overall embedding space. This approach allows us to draw connections to a wide range of previous studies, from vector space anisotropy to attention weights.

READ FULL TEXT

page 2

page 7

page 8

page 9

research
09/06/2022

Analyzing Transformers in Embedding Space

Understanding Transformer-based models has attracted significant attenti...
research
04/09/2019

Characterizing the impact of geometric properties of word embeddings on task performance

Analysis of word embedding properties to inform their use in downstream ...
research
09/27/2021

On Isotropy Calibration of Transformers

Different studies of the embedding space of transformer models suggest t...
research
08/21/2023

Analyzing Transformer Dynamics as Movement through Embedding Space

Transformer language models exhibit intelligent behaviors such as unders...
research
01/20/2021

PGT: Pseudo Relevance Feedback Using a Graph-Based Transformer

Most research on pseudo relevance feedback (PRF) has been done in vector...
research
03/27/2023

Variation and Instability in Dialect-Based Embedding Spaces

This paper measures variation in embedding spaces which have been traine...

Please sign up or login with your details

Forgot password? Click here to reset