Shape of Elephant: Study of Macro Properties of Word Embeddings Spaces

06/13/2021
by   Alexey Tikhonov, et al.
0

Pre-trained word representations became a key component in many NLP tasks. However, the global geometry of the word embeddings remains poorly understood. In this paper, we demonstrate that a typical word embeddings cloud is shaped as a high-dimensional simplex with interpretable vertices and propose a simple yet effective method for enumeration of these vertices. We show that the proposed method can detect and describe vertices of the simplex for GloVe and fasttext spaces.

READ FULL TEXT
research
05/16/2020

RPD: A Distance Function Between Word Embeddings

It is well-understood that different algorithms, training processes, and...
research
05/04/2022

Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem

Word embeddings are one of the most fundamental technologies used in nat...
research
04/18/2019

Analytical Methods for Interpretable Ultradense Word Embeddings

Word embeddings are useful for a wide variety of tasks, but they lack in...
research
12/03/2019

Geometry of martensite needles in shape memory alloys

We study the geometry of needle-shaped domains in shape-memory alloys. N...
research
03/09/2017

What can you do with a rock? Affordance extraction via word embeddings

Autonomous agents must often detect affordances: the set of behaviors en...
research
06/23/2020

Supervised Understanding of Word Embeddings

Pre-trained word embeddings are widely used for transfer learning in nat...
research
06/05/2019

Entity-Centric Contextual Affective Analysis

While contextualized word representations have improved state-of-the-art...

Please sign up or login with your details

Forgot password? Click here to reset