On Dimensional Linguistic Properties of the Word Embedding Space

10/05/2019
by   Vikas Raunak, et al.
0

Word embeddings have become a staple of several natural language processing tasks, yet much remains to be understood about their properties. In this work, we analyze word embeddings in terms of their principal components and arrive at a number of novel conclusions. In particular, we characterize the utility of variance explained by the principal components (widely used as a fundamental tool to assess the quality of the resulting representations) as a proxy for downstream performance. Further, through dimensional linguistic probing of the embedding space, we show that the syntactic information captured by a principal component does not depend on the amount of variance it explains. Consequently, we investigate the limitations of variance based embedding post-processing techniques and demonstrate that such post-processing is counter-productive in a number of scenarios such as sentence classification and machine translation tasks. Finally, we offer a few guidelines on variance based embedding post-processing. We have released the source code along with the paper.

READ FULL TEXT
research
08/20/2018

Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding

Although embedded vector representations of words offer impressive perfo...
research
09/30/2020

Interactive Re-Fitting as a Technique for Improving Word Embeddings

Word embeddings are a fixed, distributional representation of the contex...
research
05/27/2019

An Empirical Study on Post-processing Methods for Word Embeddings

Word embeddings learnt from large corpora have been adopted in various a...
research
10/25/2020

Autoencoding Improves Pre-trained Word Embeddings

Prior work investigating the geometry of pre-trained word embeddings hav...
research
01/15/2013

The Expressive Power of Word Embeddings

We seek to better understand the difference in quality of the several pu...
research
05/30/2023

Stable Anisotropic Regularization

Given the success of Large Language Models (LLMs), there has been consid...
research
04/13/2021

DirectProbe: Studying Representations without Classifiers

Understanding how linguistic structures are encoded in contextualized em...

Please sign up or login with your details

Forgot password? Click here to reset