IsoScore: Measuring the Uniformity of Vector Space Utilization

08/16/2021
by   William Rudman, et al.
0

The recent success of distributed word representations has led to an increased interest in analyzing the properties of their spatial distribution. Current metrics suggest that contextualized word embedding models do not uniformly utilize all dimensions when embedding tokens in vector space. Here we argue that existing metrics are fragile and tend to obfuscate the true spatial distribution of point clouds. To ameliorate this issue, we propose IsoScore: a novel metric which quantifies the degree to which a point cloud uniformly utilizes the ambient vector space. We demonstrate that IsoScore has several desirable properties such as mean invariance and direct correspondence to the number of dimensions used, which are properties that existing scores do not possess. Furthermore, IsoScore is conceptually intuitive and computationally efficient, making it well suited for analyzing the distribution of point clouds in arbitrary vector spaces, not necessarily limited to those of word embeddings alone. Additionally, we use IsoScore to demonstrate that a number of recent conclusions in the NLP literature that have been derived using brittle metrics of spatial distribution, such as average cosine similarity, may be incomplete or altogether inaccurate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2016

Issues in evaluating semantic spaces using word analogies

The offset method for solving word analogies has become a standard evalu...
research
04/24/2015

Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

There is rising interest in vector-space word embeddings and their use i...
research
04/09/2019

Characterizing the impact of geometric properties of word embeddings on task performance

Analysis of word embedding properties to inform their use in downstream ...
research
06/09/2015

WordRank: Learning Word Embeddings via Robust Ranking

Embedding words in a vector space has gained a lot of attention in recen...
research
03/07/2020

Discovering linguistic (ir)regularities in word embeddings through max-margin separating hyperplanes

We experiment with new methods for learning how related words are positi...
research
09/01/2020

Document Similarity from Vector Space Densities

We propose a computationally light method for estimating similarities be...
research
06/04/2018

Absolute Orientation for Word Embedding Alignment

We propose a new technique to align word embeddings which are derived fr...

Please sign up or login with your details

Forgot password? Click here to reset