What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets

03/29/2016
by   Enrico Santus, et al.
0

In this paper, we claim that Vector Cosine, which is generally considered one of the most efficient unsupervised measures for identifying word similarity in Vector Space Models, can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words, weighting such intersection according to the rank of the shared contexts in the dependency ranked lists. This claim comes from the hypothesis that similar words do not simply occur in similar contexts, but they share a larger portion of their most relevant contexts compared to other related words. To prove it, we describe and evaluate APSyn, a variant of Average Precision that, independently of the adopted parameters, outperforms the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset, beating therefore the non-English US college applicants (whose average, as reported in the literature, is 64.50 state-of-the-art approaches.

READ FULL TEXT

page 4

page 5

research
03/30/2016

Unsupervised Measure of Word Similarity: How to Outperform Co-occurrence and Vector Cosine in VSMs

In this paper, we claim that vector cosine, which is generally considere...
research
08/27/2016

Testing APSyn against Vector Cosine on Similarity Estimation

In Distributional Semantic Models (DSMs), Vector Cosine is widely used t...
research
12/25/2014

Plagiarism Detection on Electronic Text based Assignments using Vector Space Model (ICIAfS14)

Plagiarism is known as illegal use of others' part of work or whole work...
research
05/10/2022

Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words

Cosine similarity of contextual embeddings is used in many NLP tasks (e....
research
12/31/2017

A New Approach for Measuring Sentiment Orientation based on Multi-Dimensional Vector Space

This study implements a vector space model approach to measure the senti...
research
06/10/2023

Using orthogonally structured positive bases for constructing positive k-spanning sets with cosine measure guarantees

Positive spanning sets span a given vector space by nonnegative linear c...

Please sign up or login with your details

Forgot password? Click here to reset