Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

09/06/2018
by   Mikel Artetxe, et al.
0

Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness. In this paper, we show that each embedding model captures more information than directly apparent. A linear transformation that adjusts the similarity order of the model without any external resource can tailor it to achieve better results in those aspects, providing a new perspective on how embeddings encode divergent linguistic information. In addition, we explore the relation between intrinsic and extrinsic evaluation, as the effect of our transformations in downstream tasks is higher for unsupervised systems than for supervised ones.

READ FULL TEXT
research
07/04/2019

Morphological Word Embeddings

Linguistic similarity is multi-faceted. For instance, two words may be s...
research
02/18/2019

CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model

Continuous Bag of Words (CBOW) is a powerful text embedding method. Due ...
research
05/23/2018

Embedding Syntax and Semantics of Prepositions via Tensor Decomposition

Prepositions are among the most frequent words in English and play compl...
research
04/13/2021

On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings

In essence, embedding algorithms work by optimizing the distance between...
research
07/01/2016

Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

Word embeddings have recently seen a strong increase in interest as a re...
research
08/25/2019

Don't Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja

We propose a simple approach to train better Korean word representations...
research
09/04/2017

Hypothesis Testing based Intrinsic Evaluation of Word Embeddings

We introduce the cross-match test - an exact, distribution free, high-di...

Please sign up or login with your details

Forgot password? Click here to reset