An Empirical Study on Post-processing Methods for Word Embeddings

05/27/2019
by   Shuai Tang, et al.
0

Word embeddings learnt from large corpora have been adopted in various applications in natural language processing and served as the general input representations to learning systems. Recently, a series of post-processing methods have been proposed to boost the performance of word embeddings on similarity comparison and analogy retrieval tasks, and some have been adapted to compose sentence representations. The general hypothesis behind these methods is that by enforcing the embedding space to be more isotropic, the similarity between words can be better expressed. We view these methods as an approach to shrink the covariance/gram matrix, which is estimated by learning word vectors, towards a scaled identity matrix. By optimising an objective in the semi-Riemannian manifold with Centralised Kernel Alignment (CKA), we are able to search for the optimal shrinkage parameter, and provide a post-processing method to smooth the spectrum of learnt word vectors which yields improved performance on downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2020

Interactive Re-Fitting as a Technique for Improving Word Embeddings

Word embeddings are a fixed, distributional representation of the contex...
research
08/11/2017

Simple and Effective Dimensionality Reduction for Word Embeddings

Word embeddings have become the basic building blocks for several natura...
research
10/05/2019

On Dimensional Linguistic Properties of the Word Embedding Space

Word embeddings have become a staple of several natural language process...
research
11/17/2018

Unsupervised Post-processing of Word Vectors via Conceptor Negation

Word vectors are at the core of many natural language processing tasks. ...
research
08/20/2018

Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding

Although embedded vector representations of words offer impressive perfo...
research
12/14/2019

Integrating Lexical Knowledge in Word Embeddings using Sprinkling and Retrofitting

Neural network based word embeddings, such as Word2Vec and GloVe, are pu...
research
01/14/2020

Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning

Word embeddings, i.e., low-dimensional vector representations such as Gl...

Please sign up or login with your details

Forgot password? Click here to reset