Novel Ranking-Based Lexical Similarity Measure for Word Embedding

12/22/2017
by   Jakub Dutkiewicz, et al.
0

Distributional semantics models derive word space from linguistic items in context. Meaning is obtained by defining a distance measure between vectors corresponding to lexical entities. Such vectors present several problems. In this paper we provide a guideline for post process improvements to the baseline vectors. We focus on refining the similarity aspect, address imperfections of the model by applying the hubness reduction method, implementing relational knowledge into the model, and providing a new ranking similarity definition that give maximum weight to the top 1 component value. This feature ranking is similar to the one used in information retrieval. All these enrichments outperform current literature results for joint ESL and TOEF sets comparison. Since single word embedding is a basic element of any semantic task one can expect a significant improvement of results for these tasks. Moreover, our improved method of text processing can be translated to continuous distributed representation of biological sequences for deep proteomics and genomics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2017

Synergistic Union of Word2Vec and Lexicon for Domain Specific Semantic Similarity

Semantic similarity measures are an important part in Natural Language P...
research
05/25/2016

Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction

We propose a novel vector representation that integrates lexical contras...
research
02/01/2020

Concept Embedding for Information Retrieval

Concepts are used to solve the term-mismatch problem. However, we need a...
research
10/03/2016

Grounding the Lexical Sets of Causative-Inchoative Verbs with Word Embedding

Lexical sets contain the words filling the argument positions of a verb ...
research
01/23/2017

dna2vec: Consistent vector representations of variable-length k-mers

One of the ubiquitous representation of long DNA sequence is dividing it...
research
06/20/2016

Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Word embedding, specially with its recent developments, promises a quant...
research
08/22/2019

Unsupervised Lemmatization as Embeddings-Based Word Clustering

We focus on the task of unsupervised lemmatization, i.e. grouping togeth...

Please sign up or login with your details

Forgot password? Click here to reset