All-but-the-Top: Simple and Effective Postprocessing for Word Representations

02/05/2017
by   Jiaqi Mu, et al.
0

Real-valued word representations have transformed NLP applications, popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities. In this paper, we demonstrate a very simple, and yet counter-intuitive, postprocessing technique -- eliminate the common mean vector and a few top dominating directions from the word vectors -- that renders off-the-shelf representations even stronger. The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level extrinsic tasks (semantic textual similarity) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages, in each case, the processed representations are consistently better than the original ones. Furthermore, we demonstrate quantitatively in downstream applications that neural network architectures "automatically learn" the postprocessing operation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2019

A simple and effective postprocessing method for image classification

Whether it is computer vision, natural language processing or speech rec...
research
05/08/2016

Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

Lacking standardized extrinsic evaluation methods for vector representat...
research
06/21/2016

Correlation-based Intrinsic Evaluation of Word Vector Representations

We introduce QVEC-CCA--an intrinsic evaluation metric for word vector re...
research
04/12/2021

Learning to Remove: Towards Isotropic Pre-trained BERT Embedding

Pre-trained language models such as BERT have become a more common choic...
research
08/25/2019

Don't Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja

We propose a simple approach to train better Korean word representations...
research
04/18/2017

Representing Sentences as Low-Rank Subspaces

Sentences are important semantic units of natural language. A generic, d...

Please sign up or login with your details

Forgot password? Click here to reset