Catch the "Tails" of BERT

11/09/2020
by   Ziyang Luo, et al.
0

Recently, contextualized word embeddings outperform static word embeddings on many NLP tasks. However, we still don't know much about the mechanism inside these internal representations produced by BERT. Do they have any common patterns? What are the relations between word sense and context? We find that nearly all the contextualized word vectors of BERT and RoBERTa have some common patterns. For BERT, the 557^th element is always the smallest. For RoBERTa, the 588^th element is always the largest and the 77^th element is the smallest. We call them as "tails" of models. We find that these "tails" are the major cause of anisotrpy of the vector space. After "cutting the tails", the same word's different vectors are more similar to each other. The internal representations also perform better on word-in-context (WiC) task. These suggest that "cutting the tails" can decrease the influence of context and better represent word sense.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2021

Learning Sense-Specific Static Embeddings using Contextualised Word Embeddings as a Proxy

Contextualised word embeddings generated from Neural Language Models (NL...
research
09/02/2019

How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings

Replacing static word embeddings with contextualized word representation...
research
06/02/2023

Word Embeddings for Banking Industry

Applications of Natural Language Processing (NLP) are plentiful, from se...
research
12/17/2019

Analyzing Structures in the Semantic Vector Space: A Framework for Decomposing Word Embeddings

Word embeddings are rich word representations, which in combination with...
research
02/26/2019

Context Vectors are Reflections of Word Vectors in Half the Dimensions

This paper takes a step towards theoretical analysis of the relationship...
research
10/25/2020

Contextualized Word Embeddings Encode Aspects of Human-Like Word Sense Knowledge

Understanding context-dependent variation in word meanings is a key aspe...
research
12/04/2020

Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings

While the success of pre-trained language models has largely eliminated ...

Please sign up or login with your details

Forgot password? Click here to reset